Mapping Landslide Susceptibility Using Machine Learning Algorithms and GIS: A Case Study in Shexian County, Anhui Province, China

In this study, Logistics Regression (LR), Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Machine (GBM), and Multilayer Perceptron (MLP) machine learning algorithms are combined with GIS techniques to map landslide susceptibility in Shexian County, China. By using satellite images and various topographic and geological maps, 16 landslide susceptibility factor maps of Shexian County were initially constructed. In total, 502 landslide and random safety points were then using the “Extract Multivalues To Points” tool in ArcGIS, parameters for the 16 factors were extracted and imported into models for the five algorithms, of which 70% of samples were used for training and 30% of samples were used for verification, which makes sense for date symmetry. The Shexian grid was converted into 260130 vector points and imported into the five models, and the natural breakpoint method was used to divide the grid into four levels: low, moderate, high, and very high. Finally, by using column results gained using Area Under Curve (AUC) analysis and a grid chart, susceptibility results for mapping landslide prediction in Shexian County was compared using the five methods. Results indicate that the ratio of landslide points of high or very high levels from LR, SVM, RF, GBM, and MLP was 1.52, 1.77, 1.95, 1.83, and 1.64, and the ratio of very high landslide points to grade area was 1.92, 2.20, 2.98, 2.62, and 2.14, respectively. The success rate of training samples for the five methods was 0.781, 0.824, 0.853, 0.828, and 0.811, and prediction accuracy was 0.772, 0.803, 0.821, 0.815, and 0.803, respectively; the order of accuracy of the five algorithms was RF > SVM > MLP > GBM > LR. Our results indicate that the five machine learning algorithms have good effect on landslide susceptibility evaluation in Shexian area, with Random Forest having the best effect.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Detection of Phishing in Internet of Things Using Machine Learning Approach

International Journal of Digital Crime and Forensics ◽

10.4018/ijdcf.2021030101 ◽

2021 ◽

Vol 13 (2) ◽

pp. 1-15

Author(s):

Sameena Naaz

Keyword(s):

Machine Learning ◽

Random Forest ◽

Internet Of Things ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Phishing Attacks ◽

Prediction And Prevention ◽

Machine Learning Approach ◽

Very High

Phishing attacks are growing in the similar manner as e-commerce industries are growing. Prediction and prevention of phishing attacks is a very critical step towards safeguarding online transactions. Data mining tools can be applied in this regard as the technique is very easy and can mine millions of information within seconds and deliver accurate results. With the help of machine learning algorithms like random forest, decision tree, neural network, and linear model, we can classify data into phishing, suspicious, and legitimate. The devices that are connected over the internet, known as internet of things (IoT), are also at very high risk of phishing attack. In this work, machine learning algorithms random forest classifier, support vector machine, and logistic regression have been applied on IoT dataset for detection of phishing attacks, and then the results have been compared with previous work carried out on the same dataset as well as on a different dataset. The results of these algorithms have then been compared in terms of accuracy, error rate, precision, and recall.

Download Full-text

Feature Selection and Comparison of Machine Learning Algorithms in Classification of Grazing and Rumination Behaviour in Sheep

Sensors ◽

10.3390/s18103532 ◽

2018 ◽

Vol 18 (10) ◽

pp. 3532 ◽

Cited By ~ 16

Author(s):

Nicola Mansbridge ◽

Jurgen Mitsch ◽

Nicola Bollard ◽

Keith Ellis ◽

Giuliana Miguel-Pacheco ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Time Budget ◽

Learning Algorithms ◽

Eating Behaviour ◽

Machine Learning Algorithms ◽

Support Vector ◽

Optimum Number ◽

Eating Behaviours ◽

Adaptive Boosting

Grazing and ruminating are the most important behaviours for ruminants, as they spend most of their daily time budget performing these. Continuous surveillance of eating behaviour is an important means for monitoring ruminant health, productivity and welfare. However, surveillance performed by human operators is prone to human variance, time-consuming and costly, especially on animals kept at pasture or free-ranging. The use of sensors to automatically acquire data, and software to classify and identify behaviours, offers significant potential in addressing such issues. In this work, data collected from sheep by means of an accelerometer/gyroscope sensor attached to the ear and collar, sampled at 16 Hz, were used to develop classifiers for grazing and ruminating behaviour using various machine learning algorithms: random forest (RF), support vector machine (SVM), k nearest neighbour (kNN) and adaptive boosting (Adaboost). Multiple features extracted from the signals were ranked on their importance for classification. Several performance indicators were considered when comparing classifiers as a function of algorithm used, sensor localisation and number of used features. Random forest yielded the highest overall accuracies: 92% for collar and 91% for ear. Gyroscope-based features were shown to have the greatest relative importance for eating behaviours. The optimum number of feature characteristics to be incorporated into the model was 39, from both ear and collar data. The findings suggest that one can successfully classify eating behaviours in sheep with very high accuracy; this could be used to develop a device for automatic monitoring of feed intake in the sheep sector to monitor health and welfare.

Download Full-text

Abstract 15895: Machine Learning Algorithms to Predict Major Adverse Cardiovascular Events in Patients Undergoing Orthotopic Liver Transplantation: A Retrospective Cohort Study

Circulation ◽

10.1161/circ.142.suppl_3.15895 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

vardhmaan jain ◽

Vikram Sharma ◽

Agam Bansal ◽

Cerise Kleb ◽

Chirag Sheth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Cardiovascular Events ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Major Adverse Cardiovascular Events ◽

Support Vector ◽

Post Transplant ◽

Extreme Gradient Boosting ◽

All Cause Mortality

Background: Post-transplant major adverse cardiovascular events (MACE) are amongst the leading cause of death amongst orthotopic liver transplant(OLT) recipients. Despite years of guideline directed therapy, there are limited data on predictors of post-OLT MACE. We assessed if machine learning algorithms (MLA) can predict MACE and all-cause mortality in patients undergoing OLT. Methods: We tested three MLA: support vector machine, extreme gradient boosting(XG-Boost) and random forest with traditional logistic regression for prediction of MACE and all-cause mortality on a cohort of consecutive patients undergoing OLT at our center between 2008-2019. The cohort was randomly split into a training (80%) and testing (20%) cohort. Model performance was assessed using c-statistic or AUC. Results: We included 1,459 consecutive patients with mean ± SD age 54.2 ± 13.8 years, 32% female who underwent OLT. There were 199 (13.6%) MACE and 289 (20%) deaths at a mean follow up of 4.56 ± 3.3 years. The random forest MLA was the best performing model for predicting MACE [AUC:0.78, 95% CI: 0.70-0.85] as well as mortality [AUC:0.69, 95% CI: 0.61-0.76], with all models performing better when predicting MACE vs mortality. See Table and Figure. Conclusion: Random forest machine learning algorithms were more predictive and discriminative than traditional regression models for predicting major adverse cardiovascular events and all-cause mortality in patients undergoing OLT. Validation and subsequent incorporation of MLA in clinical decision making for OLT candidacy could help risk stratify patients for post-transplant adverse cardiovascular events.

Download Full-text

Evaluating Variable Selection and Machine Learning Algorithms for Estimating Forest Heights by Combining Lidar and Hyperspectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090507 ◽

2020 ◽

Vol 9 (9) ◽

pp. 507

Author(s):

Sanjiwana Arjasakusuma ◽

Sandiaga Swahyu Kusuma ◽

Stuart Phinn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Principal Component ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forest Height ◽

Extreme Gradient Boosting

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

Prediction of novel mouse TLR9 agonists using a random forest approach

BMC Molecular and Cell Biology ◽

10.1186/s12860-019-0241-0 ◽

2019 ◽

Vol 20 (S2) ◽

Author(s):

Varun Khanna ◽

Lei Li ◽

Johnson Fung ◽

Shoba Ranganathan ◽

Nikolai Petrovsky

Keyword(s):

Machine Learning ◽

Random Forest ◽

Correlation Coefficient ◽

Matthews Correlation Coefficient ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Innate Immune ◽

Machine Learning Algorithms ◽

Support Vector ◽

Random Forest Algorithm

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.

Download Full-text

Execution Assessment of Machine Learning Algorithms for Spam Profile Detection on Instagram

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/561032021 ◽

2021 ◽

Vol 10 (3) ◽

pp. 1889-1894

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Random Forest ◽

Nearest Neighbor ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Tools ◽

Learning Models ◽

K Nearest Neighbor

Witheverypassingsecondsocialnetworkcommunityisgrowingrapidly,becauseofthat,attackershaveshownkeeninterestinthesekindsofplatformsandwanttodistributemischievouscontentsontheseplatforms.Withthefocus on introducing new set of characteristics and features forcounteractivemeasures,agreatdealofstudieshasresearchedthe possibility of lessening the malicious activities on social medianetworks. This research was to highlight features for identifyingspammers on Instagram and additional features were presentedto improve the performance of different machine learning algorithms. Performance of different machine learning algorithmsnamely, Multilayer Perceptron (MLP), Random Forest (RF), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM)were evaluated on machine learning tools named, RapidMinerand WEKA. The results from this research tells us that RandomForest (RF) outperformed all other selected machine learningalgorithmsonbothselectedmachinelearningtools.OverallRandom Forest (RF) provided best results on RapidMiner. Theseresultsareusefulfortheresearcherswhoarekeentobuildmachine learning models to find out the spamming activities onsocialnetworkcommunities.

Download Full-text

PREDICTION OF POLLUTANT CONCENTRATIONS BY METEOROLOGICAL DATA USING MACHINE LEARNING ALGORITHMS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliv-4-w3-2020-21-2020 ◽

2020 ◽

Vol XLIV-4/W3-2020 ◽

pp. 21-27

Author(s):

K. Alpan ◽

B. Sekeroglu

Keyword(s):

Machine Learning ◽

Random Forest ◽

Smart City ◽

Urban Areas ◽

Meteorological Data ◽

Learning Algorithms ◽

Cost Effective ◽

Machine Learning Algorithms ◽

Support Vector ◽

Pollutant Concentrations

Abstract. Air pollution, which is one of the biggest problems created by the developing world, reaches severe levels, especially in urban areas. Weather stations established at certain points in countries regularly obtain data and inform people about air quality. In Smart City applications, it is aimed to perform this process with higher speed and accuracy by collecting data with thousands of sensors based on the Internet of Things. At this stage, artificial intelligence and machine learning plays a vital role in analyzing the data to be obtained. In this study, six pollutant concentrations; particulate matters (PM2.5 and PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), Ozone (O3), and carbon monoxide (CO), were predicted using three basic machine learning algorithms, namely, random forest, decision tree and support vector regression, by considering only meteorological data. Experiments on two different datasets showed that the random forest has a high prediction capacity (R2: 0.74–0.86), and high-accuracy predictions can be performed on pollutant concentrations using only meteorological data. This and further studies based on meteorological data would help to reduce the number of devices in Smart City applications and will make it more cost-effective.

Download Full-text

Modelling hydrological responses under climate change using machine learning algorithms – semi-arid river basin of peninsular India

H2Open Journal ◽

10.2166/h2oj.2020.034 ◽

2020 ◽

Vol 3 (1) ◽

pp. 481-498

Author(s):

G. Sireesha Naidu ◽

M. Pratik ◽

S. Rehana

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

River Basin ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Peninsular India ◽

Hydrological Responses ◽

Semi Arid

Abstract Catchment scale conceptual hydrological models apply calibration parameters entirely based on observed historical data in the climate change impact assessment. The study used the most advanced machine learning algorithms based on Ensemble Regression and Random Forest models to develop dynamically calibrated factors which can form as a basis for the analysis of hydrological responses under climate change. The Random Forest algorithm was identified as a robust method to model the calibration factors with limited data for training and testing with precipitation, evapotranspiration and uncalibrated runoff based on various performance measures. The developed model was further used to study the runoff response under climate change variability of precipitation and temperatures. A statistical downscaling model based on K-means clustering, Classification and Regression Trees and Support Vector Regression was used to develop the precipitation and temperature projections based on MIROC GCM outputs with the RCP 4.5 scenario. The proposed modelling framework has been demonstrated on a semi-arid river basin of peninsular India, Krishna River Basin (KRB). The basin outlet runoff was predicted to decrease (13.26%) for future scenarios under climate change due to an increase in temperature (0.6 °C), compared to a precipitation increase (13.12%), resulting in an overall reduction in water availability over KRB.

Download Full-text