Application of Advanced Machine Learning Algorithms to Assess Groundwater Potential Using Remote Sensing-Derived Data

Groundwater (GW) is being uncontrollably exploited in various parts of the world resulting from huge needs for water supply as an outcome of population growth and industrialization. Bearing in mind the importance of GW potential assessment in reaching sustainability, this study seeks to use remote sensing (RS)-derived driving factors as an input of the advanced machine learning algorithms (MLAs), comprising deep boosting and logistic model trees to evaluate their efficiency. To do so, their results are compared with three benchmark MLAs such as boosted regression trees, k-nearest neighbors, and random forest. For this purpose, we firstly assembled different topographical, hydrological, RS-based, and lithological driving factors such as altitude, slope degree, aspect, slope length, plan curvature, profile curvature, relative slope position, distance from rivers, river density, topographic wetness index, land use/land cover (LULC), normalized difference vegetation index (NDVI), distance from lineament, lineament density, and lithology. The GW spring indicator was divided into two classes for training (434 springs) and validation (186 springs) with a proportion of 70:30. The training dataset of the springs accompanied by the driving factors were incorporated into the MLAs and the outputs were validated by different indices such as accuracy, kappa, receiver operating characteristics (ROC) curve, specificity, and sensitivity. Based upon the area under the ROC curve, the logistic model tree (87.813%) generated similar performance to deep boosting (87.807%), followed by boosted regression trees (87.397%), random forest (86.466%), and k-nearest neighbors (76.708%) MLAs. The findings confirm the great performance of the logistic model tree and deep boosting algorithms in modelling GW potential. Thus, their application can be suggested for other areas to obtain an insight about GW-related barriers toward sustainability. Further, the outcome based on the logistic model tree algorithm depicts the high impact of the RS-based factor, such as NDVI with 100 relative influence, as well as high influence of the distance from river, altitude, and RSP variables with 46.07, 43.47, and 37.20 relative influence, respectively, on GW potential.

Download Full-text

Landslide Detection and Susceptibility Modeling on Cameron Highlands (Malaysia): A Comparison between Random Forest, Logistic Regression and Logistic Model Tree Algorithms

Forests ◽

10.3390/f11080830 ◽

2020 ◽

Vol 11 (8) ◽

pp. 830 ◽

Cited By ~ 2

Author(s):

Viet-Ha Nhu ◽

Ayub Mohammadi ◽

Himan Shahabi ◽

Baharin Bin Ahmad ◽

Nadhir Al-Ansari ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Logistic Regression ◽

Random Forest ◽

Landslide Susceptibility ◽

Predictive Value ◽

Logistic Model ◽

Model Tree ◽

Cameron Highlands ◽

Logistic Model Tree

We used remote sensing techniques and machine learning to detect and map landslides, and landslide susceptibility in the Cameron Highlands, Malaysia. We located 152 landslides using a combination of interferometry synthetic aperture radar (InSAR), Google Earth (GE), and field surveys. Of the total slide locations, 80% (122 landslides) were utilized for training the selected algorithms, and the remaining 20% (30 landslides) were applied for validation purposes. We employed 17 conditioning factors, including slope angle, aspect, elevation, curvature, profile curvature, stream power index (SPI), topographic wetness index (TWI), lithology, soil type, land cover, normalized difference vegetation index (NDVI), distance to river, distance to fault, distance to road, river density, fault density, and road density, which were produced from satellite imageries, geological map, soil maps, and a digital elevation model (DEM). We used these factors to produce landslide susceptibility maps using logistic regression (LR), logistic model tree (LMT), and random forest (RF) models. To assess prediction accuracy of the models we employed the following statistical measures: negative predictive value (NPV), sensitivity, positive predictive value (PPV), specificity, root-mean-squared error (RMSE), accuracy, and area under the receiver operating characteristic (ROC) curve (AUC). Our results indicated that the AUC was 92%, 90%, and 88% for the LMT, LR, and RF algorithms, respectively. To assess model performance, we also applied non-parametric statistical tests of Friedman and Wilcoxon, where the results revealed that there were no practical differences among the used models in the study area. While landslide mapping in tropical environment such as Cameron Highlands remains difficult, the remote sensing (RS) along with machine learning techniques, such as the LMT model, show promise for landslide susceptibility mapping in the study area.

Download Full-text

Predicting hospitalization following psychiatric crisis care using machine learning

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01361-1 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Prediction Models ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Ensemble Model ◽

K Nearest Neighbors ◽

Crisis Care

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text

System Model for Prediction Analytics Using K-Nearest Neighbors Algorithm

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8536 ◽

2019 ◽

Vol 16 (10) ◽

pp. 4425-4430 ◽

Cited By ~ 1

Author(s):

Devendra Prasad ◽

Sandip Kumar Goyal ◽

Avinash Sharma ◽

Amit Bindal ◽

Virendra Singh Kushwah

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Research Work ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

System Model ◽

K Nearest Neighbors ◽

Prediction Analysis ◽

Pros And Cons

Machine Learning is a growing area in computer science in today’s era. This article is focusing on prediction analysis using K-Nearest Neighbors (KNN) Machine Learning algorithm. Data in the dataset are processed, analyzed and predicated using the specified algorithm. Introduction of various Machine Learning algorithms, its pros and cons have been discussed. The KNN algorithm with detail study is given and it is implemented on the specified data with certain parameters. The research work elucidates prediction analysis and explicates the prediction of quality of restaurants.

Download Full-text

Application Machine Learning in Construction Management

TEM Journal ◽

10.18421/tem103-48 ◽

2021 ◽

pp. 1385-1389

Author(s):

Phong Thanh Nguyen

Keyword(s):

Machine Learning ◽

Prediction Model ◽

Price Index ◽

Learning Algorithm ◽

Consumer Price Index ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Economic Variables ◽

K Nearest Neighbors ◽

Socio Economic Variables

Machine Learning is a subset and technology developed in the field of Artificial Intelligence (AI). One of the most widely used machine learning algorithms is the K-Nearest Neighbors (KNN) approach because it is a supervised learning algorithm. This paper applied the K-Nearest Neighbors (KNN) algorithm to predict the construction price index based on Vietnam's socio-economic variables. The data to build the prediction model was from the period 2016 to 2019 based on seven socio-economic variables that impact the construction price index (i.e., industrial production, construction investment capital, Vietnam’s stock price index, consumer price index, foreign exchange rate, total exports, and imports). The research results showed that the construction price index prediction model based on the K-Nearest Neighbors (KNN) regression method has fewer errors than the traditional method.

Download Full-text

PigLeg: prediction of swine phenotype using machine learning

PeerJ ◽

10.7717/peerj.8764 ◽

2020 ◽

Vol 8 ◽

pp. e8764 ◽

Cited By ~ 2

Author(s):

Siroj Bakoev ◽

Lyubov Getmantseva ◽

Maria Kolosova ◽

Olga Kostyunina ◽

Duane R. Chartier ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Learning Algorithms ◽

Average Daily Gain ◽

Nearest Neighbors ◽

The State ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Leg Weakness

Industrial pig farming is associated with negative technological pressure on the bodies of pigs. Leg weakness and lameness are the sources of significant economic loss in raising pigs. Therefore, it is important to identify the predictors of limb condition. This work presents assessments of the state of limbs using indicators of growth and meat characteristics of pigs based on machine learning algorithms. We have evaluated and compared the accuracy of prediction for nine ML classification algorithms (Random Forest, K-Nearest Neighbors, Artificial Neural Networks, C50Tree, Support Vector Machines, Naive Bayes, Generalized Linear Models, Boost, and Linear Discriminant Analysis) and have identified the Random Forest and K-Nearest Neighbors as the best-performing algorithms for predicting pig leg weakness using a small set of simple measurements that can be taken at an early stage of animal development. Measurements of Muscle Thickness, Back Fat amount, and Average Daily Gain were found to be significant predictors of the conformation of pig limbs. Our work demonstrates the utility and relative ease of using machine learning algorithms to assess the state of limbs in pigs based on growth rate and meat characteristics.

Download Full-text

Who are My Family Members? A Solution Based on Image Processing and Machine Learning

International Journal of Image and Graphics ◽

10.1142/s0219467820500333 ◽

2020 ◽

Vol 20 (04) ◽

pp. 2050033

Author(s):

Matthew R. Kiley ◽

Md Shafaeat Hossain

Keyword(s):

Machine Learning ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Agglomerative Hierarchical Clustering ◽

Extraction Techniques ◽

Agglomerative Clustering ◽

Vulnerable Children ◽

K Nearest Neighbors ◽

Practical Applications ◽

Image Creation

Image creation and retention are growing at an exponential rate. Individuals produce more images today than ever in history and often these images contain family. In this paper, we develop a framework to detect or identify family in a face image dataset. The ability to identify family in a dataset of images could have a critical impact on finding lost and vulnerable children, identifying terror suspects, social media interactions, and other practical applications. We evaluated our framework by performing experiments on two facial image datasets, the Y-Face and KinFaceW, comprising 37 and 920 images, respectively. We tested two feature extraction techniques, namely PCA and HOG, and three machine learning algorithms, namely K-Means, agglomerative hierarchical clustering, and K nearest neighbors. We achieved promising results with a maximum detection rate of 94.59% using K-Means, 89.18% with agglomerative clustering, and 77.42% using K-nearest neighbors.

Download Full-text

Software Development Effort Duration and Cost Estimation using Linear Regression and K-Nearest Neighbors Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2306.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1043-1047

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Software Development ◽

Cost Estimation ◽

Performance Metrics ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Development Effort ◽

Effort Estimation ◽

K Nearest Neighbors

Effort estimation is a crucial step that leads to Duration estimation and cost estimation in software development. Estimations done in the initial stage of projects are based on requirements that may lead to success or failure of the project. Accurate estimations lead to success and inaccurate estimates lead to failure. There is no one particular method which cloud do accurate estimations. In this work, we propose Machine learning techniques linear regression and K-nearest Neighbors to predict Software Effort estimation using COCOMO81, COCOMONasa, and COCOMONasa2 datasets. The results obtained from these two methods have been compared. The 80% data in data sets used for training and remaining used as the test set. The correlation coefficient, Mean squared error (MSE) and Mean magnitude relative error (MMRE) are used as performance metrics. The experimental results show that these models forecast the software effort accurately.

Download Full-text

APTITUDE Framework for Learning Data Classification Based on Machine Learning

International Journal of Circuits, Systems and Signal Processing ◽

10.46300/9106.2020.14.51 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Machine Learning ◽

Data Classification ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Course Content ◽

Applied Model ◽

Vector Machines ◽

Learning Data

Learning analytics refers to the machine learning to provide predictions of learner success and prescriptions to learners and teachers. The main goal of paper is to proposed APTITUDE framework for learning data classification in order to achieve an adaptation and recommendations a course content or flow of course activities. This framework has applied model for student learning prediction based on machine learning. The five machine learning algorithms are used to provide learning data classification: random forest, Naïve Bayes, k-nearest neighbors, logistic regression and support vector machines

Download Full-text

Real Time Efficient Accident Predictor System using Machine Learning Techniques (kNN, RF, LR, DT)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d6910.1210220 ◽

2020 ◽

Vol 10 (2) ◽

pp. 108-111

Keyword(s):

Machine Learning ◽

Random Forest ◽

Real Time ◽

Classification Accuracy ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Classification Methods ◽

K Nearest Neighbors ◽

Learning Techniques

Real time crash predictor system is determining frequency of crashes and also severity of crashes. Nowadays machine learning based methods are used to predict the total number of crashes. In this project, prediction accuracy of machine learning algorithms like Decision tree (DT), K-nearest neighbors (KNN), Random forest (RF), Logistic Regression (LR) are evaluated. Performance analysis of these classification methods are evaluated in terms of accuracy. Dataset included for this project is obtained from 49 states of US and 27 states of India which contains 2.25 million US accident crash records and 1.16 million crash records respectively. Results prove that classification accuracy obtained from Random Forest (RF) is96% compared to other classification methods.

Download Full-text