Machine Learning: Using Optimized KNN (K-Nearest Neighbors) to Predict the Facies Classifications

Abstract Background Accurate prediction models for whether patients on the verge of a psychiatric criseis need hospitalization are lacking and machine learning methods may help improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate the accuracy of ten machine learning algorithms, including the generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact. We also evaluate an ensemble model to optimize the accuracy and we explore individual predictors of hospitalization. Methods Data from 2084 patients included in the longitudinal Amsterdam Study of Acute Psychiatry with at least one reported psychiatric crisis care contact were included. Target variable for the prediction models was whether the patient was hospitalized in the 12 months following inclusion. The predictive power of 39 variables related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts was evaluated. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared and we also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis and the five best performing algorithms were combined in an ensemble model using stacking. Results All models performed above chance level. We found Gradient Boosting to be the best performing algorithm (AUC = 0.774) and K-Nearest Neighbors to be the least performing (AUC = 0.702). The performance of GLM/logistic regression (AUC = 0.76) was slightly above average among the tested algorithms. In a Net Reclassification Improvement analysis Gradient Boosting outperformed GLM/logistic regression by 2.9% and K-Nearest Neighbors by 11.3%. GLM/logistic regression outperformed K-Nearest Neighbors by 8.7%. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was in most cases modest. The results show that a predictive accuracy similar to the best performing model can be achieved when combining multiple algorithms in an ensemble model.

Download Full-text

Predicting Hospitalization following Psychiatric Crisis Care using Machine Learning

10.21203/rs.2.12338/v1 ◽

2019 ◽

Author(s):

Matthijs Blankers ◽

Louk F. M. van der Post ◽

Jack J. M. Dekker

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Learning Algorithms ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

Predictor Variables ◽

Gradient Boosting ◽

K Nearest Neighbors ◽

Psychiatric Crisis ◽

Crisis Care

Abstract Background: It is difficult to accurately predict whether a patient on the verge of a potential psychiatric crisis will need to be hospitalized. Machine learning may be helpful to improve the accuracy of psychiatric hospitalization prediction models. In this paper we evaluate and compare the accuracy of ten machine learning algorithms including the commonly used generalized linear model (GLM/logistic regression) to predict psychiatric hospitalization in the first 12 months after a psychiatric crisis care contact, and explore the most important predictor variables of hospitalization. Methods: Data from 2,084 patients with at least one reported psychiatric crisis care contact included in the longitudinal Amsterdam Study of Acute Psychiatry were used. The accuracy and area under the receiver operating characteristic curve (AUC) of the machine learning algorithms were compared. We also estimated the relative importance of each predictor variable. The best and least performing algorithms were compared with GLM/logistic regression using net reclassification improvement analysis. Target variable for the prediction models was whether or not the patient was hospitalized in the 12 months following inclusion in the study. The 39 predictor variables were related to patients’ socio-demographics, clinical characteristics and previous mental health care contacts. Results: We found Gradient Boosting to perform the best (AUC=0.774) and K-Nearest Neighbors performing the least (AUC=0.702). The performance of GLM/logistic regression (AUC=0.76) was above average among the tested algorithms. Gradient Boosting outperformed GLM/logistic regression and K-Nearest Neighbors, and GLM outperformed K-Nearest Neighbors in a Net Reclassification Improvement analysis, although the differences between Gradient Boosting and GLM/logistic regression were small. Nine of the top-10 most important predictor variables were related to previous mental health care use. Conclusions: Gradient Boosting led to the highest predictive accuracy and AUC while GLM/logistic regression performed average among the tested algorithms. Although statistically significant, the magnitude of the differences between the machine learning algorithms was modest. Future studies may consider to combine multiple algorithms in an ensemble model for optimal performance and to mitigate the risk of choosing suboptimal performing algorithms.

Download Full-text

A machine learning approach of finding the optimal anisotropic SPH kernel

Journal of Physics Conference Series ◽

10.1088/1742-6596/2090/1/012115 ◽

2021 ◽

Vol 2090 (1) ◽

pp. 012115

Author(s):

Eraldo Pereira Marinho

Keyword(s):

Machine Learning ◽

Convex Hull ◽

Compact Support ◽

Nearest Neighbors ◽

The Self ◽

Learning Approach ◽

K Nearest Neighbors ◽

Machine Learning Approach

Abstract It is presented a machine learning approach to find the optimal anisotropic SPH kernel, whose compact support consists of an ellipsoid that matches with the convex hull of the self-regulating k-nearest neighbors of the smoothing particle (query).

Download Full-text

Pendekatan Model Machine Learning dalam Pemeringkatan Status Sosial Ekonomi Rumah Tangga di Indonesia

Seminar Nasional Official Statistics ◽

10.34123/semnasoffstat.v2021i1.1018 ◽

2021 ◽

Vol 2021 (1) ◽

pp. 1044-1053

Author(s):

Nuri Taufiq ◽

Siti Mariyah

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Nearest Neighbors ◽

Multivariate Adaptive Regression Splines ◽

Regression Splines ◽

K Nearest Neighbors ◽

Forward Stepwise ◽

Adaptive Regression ◽

Adaptive Regression Splines

Metode yang digunakan untuk pemeringkatan status sosial ekonomi rumah tangga Basis Data Terpadu adalah dengan memprediksi nilai pengeluaran rumah tangga dengan metode Proxy Mean Testing (PMT). Secara umum metode ini merupakan model prediksi dengan menggunakan teknik regresi. Pilihan model statistik yang digunakan adalah forward-stepwise. Dalam praktiknya diasumsikan bahwa variabel prediktor yang digunakan dalam PMT memiliki korelasi linier dengan variabel pengeluaran. Penelitian ini mencoba menerapkan pendekatan machine learning sebagai alternatif metode prediksi selain model forward-stepwise. Model dibangun menggunakan beberapa algoritma machine learning seperti Multivariate Adaptive Regression Splines (MARS), K-Nearest Neighbors, Decision Tree, dan Bagging. Hasil pemodelan menunjukkan bahwa model machine learning menghasilkan nilai rata-rata inclusion error (IE) lebih rendah dibandingkan nilai rata-rata exclusion error (EE). Model machine learning bekerja efektif dalam mengurangi IE namun belum cukup sensitif dalam mengurangi EE. Nilai rata-rata IE model machine learning sebesar 0,21 sedangkan nilai rata-rata IE model PMT sebesar 0,29.

Download Full-text

Decision Tree vs K-Nearest Neighbors: Machine Learning Based Wind Estimation for Unmanned Aerial Vehicles

10.2514/6.2022-2500 ◽

2022 ◽

Author(s):

Ahmed Baraka ◽

Nathan Lindsay ◽

Liang Sun ◽

George Gorospe

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Unmanned Aerial Vehicles ◽

Nearest Neighbors ◽

K Nearest Neighbors ◽

Wind Estimation ◽

Aerial Vehicles

Download Full-text

A Machine Learning-Based Model for Content-Based Image Retrieval

Semantic-Based Visual Information Retrieval ◽

10.4018/978-1-59904-370-8.ch011 ◽

2011 ◽

pp. 230-251 ◽

Cited By ~ 2

Author(s):

Hakim Hacid ◽

Abdelkader Djamel Zighed

Keyword(s):

Machine Learning ◽

Image Retrieval ◽

Nearest Neighbors ◽

Multimedia Databases ◽

Similarity Criteria ◽

Content Based Image Retrieval ◽

Multidimensional Space ◽

K Nearest Neighbors ◽

Indexing Structure ◽

Group Data

A multimedia index makes it possible to group data according to similarity criteria. Traditional index structures are based on trees and use the k-Nearest Neighbors (k-NN) approach to retrieve databases. Due to some disadvantages of a such approach, the use of neighborhood graphs was proposed. This approach is interesting but it has some disadvantages consisting, mainly, in its complexity. This chapter presents a step in a long process of analyzing, structuring, and retrieving multimedia databases. Indeed, we propose an effective method for locally updating neighborhood graphs which constitute our multimedia index. Then, we exploit this structure in order to make easy and effective the retrieval process using queries in an image form in one hand. In another hand, we use the indexing structure to annotate images in order to describe their semantics. The proposed approach is based on an intelligent manner for locating points in a multidimensional space. Promising results are obtained after experimentations on various databases. Future issues of the proposed approach are very relevant in this domain.

Download Full-text

Application of Advanced Machine Learning Algorithms to Assess Groundwater Potential Using Remote Sensing-Derived Data

Remote Sensing ◽

10.3390/rs12172742 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2742

Author(s):

Ehsan Kamali Maskooni ◽

Seyed Amir Naghibi ◽

Hossein Hashemi ◽

Ronny Berndtsson

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Logistic Model ◽

Nearest Neighbors ◽

Driving Factors ◽

Machine Learning Algorithms ◽

Boosted Regression Trees ◽

K Nearest Neighbors ◽

Model Tree ◽

Logistic Model Tree

Groundwater (GW) is being uncontrollably exploited in various parts of the world resulting from huge needs for water supply as an outcome of population growth and industrialization. Bearing in mind the importance of GW potential assessment in reaching sustainability, this study seeks to use remote sensing (RS)-derived driving factors as an input of the advanced machine learning algorithms (MLAs), comprising deep boosting and logistic model trees to evaluate their efficiency. To do so, their results are compared with three benchmark MLAs such as boosted regression trees, k-nearest neighbors, and random forest. For this purpose, we firstly assembled different topographical, hydrological, RS-based, and lithological driving factors such as altitude, slope degree, aspect, slope length, plan curvature, profile curvature, relative slope position, distance from rivers, river density, topographic wetness index, land use/land cover (LULC), normalized difference vegetation index (NDVI), distance from lineament, lineament density, and lithology. The GW spring indicator was divided into two classes for training (434 springs) and validation (186 springs) with a proportion of 70:30. The training dataset of the springs accompanied by the driving factors were incorporated into the MLAs and the outputs were validated by different indices such as accuracy, kappa, receiver operating characteristics (ROC) curve, specificity, and sensitivity. Based upon the area under the ROC curve, the logistic model tree (87.813%) generated similar performance to deep boosting (87.807%), followed by boosted regression trees (87.397%), random forest (86.466%), and k-nearest neighbors (76.708%) MLAs. The findings confirm the great performance of the logistic model tree and deep boosting algorithms in modelling GW potential. Thus, their application can be suggested for other areas to obtain an insight about GW-related barriers toward sustainability. Further, the outcome based on the logistic model tree algorithm depicts the high impact of the RS-based factor, such as NDVI with 100 relative influence, as well as high influence of the distance from river, altitude, and RSP variables with 46.07, 43.47, and 37.20 relative influence, respectively, on GW potential.

Download Full-text

System Model for Prediction Analytics Using K-Nearest Neighbors Algorithm

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8536 ◽

2019 ◽

Vol 16 (10) ◽

pp. 4425-4430 ◽

Cited By ~ 1

Author(s):

Devendra Prasad ◽

Sandip Kumar Goyal ◽

Avinash Sharma ◽

Amit Bindal ◽

Virendra Singh Kushwah

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Research Work ◽

Nearest Neighbors ◽

Machine Learning Algorithms ◽

System Model ◽

K Nearest Neighbors ◽

Prediction Analysis ◽

Pros And Cons

Machine Learning is a growing area in computer science in today’s era. This article is focusing on prediction analysis using K-Nearest Neighbors (KNN) Machine Learning algorithm. Data in the dataset are processed, analyzed and predicated using the specified algorithm. Introduction of various Machine Learning algorithms, its pros and cons have been discussed. The KNN algorithm with detail study is given and it is implemented on the specified data with certain parameters. The research work elucidates prediction analysis and explicates the prediction of quality of restaurants.

Download Full-text

Analisis Perbandingan Algoritma SVM, KNN, dan CNN untuk Klasifikasi Citra Cuaca

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2021824553 ◽

2021 ◽

Vol 8 (2) ◽

pp. 311

Author(s):

Mohammad Farid Naufal

Keyword(s):

Neural Network ◽

Machine Learning ◽

Computer Vision ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Cross Validation ◽

Nearest Neighbors ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbors

Cuaca merupakan faktor penting yang dipertimbangkan untuk berbagai pengambilan keputusan. Klasifikasi cuaca manual oleh manusia membutuhkan waktu yang lama dan inkonsistensi. Computer vision adalah cabang ilmu yang digunakan komputer untuk mengenali atau melakukan klasifikasi citra. Hal ini dapat membantu pengembangan self autonomous machine agar tidak bergantung pada koneksi internet dan dapat melakukan kalkulasi sendiri secara real time. Terdapat beberapa algoritma klasifikasi citra populer yaitu K-Nearest Neighbors (KNN), Support Vector Machine (SVM), dan Convolutional Neural Network (CNN). KNN dan SVM merupakan algoritma klasifikasi dari Machine Learning sedangkan CNN merupakan algoritma klasifikasi dari Deep Neural Network. Penelitian ini bertujuan untuk membandingkan performa dari tiga algoritma tersebut sehingga diketahui berapa gap performa diantara ketiganya. Arsitektur uji coba yang dilakukan adalah menggunakan 5 cross validation. Beberapa parameter digunakan untuk mengkonfigurasikan algoritma KNN, SVM, dan CNN. Dari hasil uji coba yang dilakukan CNN memiliki performa terbaik dengan akurasi 0.942, precision 0.943, recall 0.942, dan F1 Score 0.942. AbstractWeather is an important factor that is considered for various decision making. Manual weather classification by humans is time consuming and inconsistent. Computer vision is a branch of science that computers use to recognize or classify images. This can help develop self-autonomous machines so that they are not dependent on an internet connection and can perform their own calculations in real time. There are several popular image classification algorithms, namely K-Nearest Neighbors (KNN), Support Vector Machine (SVM), and Convolutional Neural Network (CNN). KNN and SVM are Machine Learning classification algorithms, while CNN is a Deep Neural Networks classification algorithm. This study aims to compare the performance of that three algorithms so that the performance gap between the three is known. The test architecture is using 5 cross validation. Several parameters are used to configure the KNN, SVM, and CNN algorithms. From the test results conducted by CNN, it has the best performance with 0.942 accuracy, 0.943 precision, 0.942 recall, and F1 Score 0.942.

Download Full-text