scholarly journals Heart disease prediction model with k-nearest neighbor algorithm

Author(s):  
Tssehay Admassu Assegie

<span>In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%</span>

Author(s):  
Rony Chowdhury Ripan ◽  
Iqbal H. Sarker ◽  
Md. Hasan Furhad ◽  
Md Musfique Anwar ◽  
Mohammed Moshiul Hoque

This paper presents an effective heart disease prediction model through detecting the anomalies, also known as outliers, in healthcare data using the unsupervised K-means clustering algorithm. Most existing approaches for detecting anomalies are based on constructing profiles of normal instances. However, such techniques require an adequate number of normal profiles to justify those models. Our proposed model first evaluates an \textit{optimal} value of K using Silhouette method. Next, it intends to locate anomalies that are far from a certain threshold distance with respect to their clusters. Finally, the five most popular classification techniques such as K-Nearest Neighbor (KNN), Random Forest (RF), Support Vector Machines (SVM), Naive Bayes (NB), and Logistic Regression (LR) are applied to build the resultant prediction model. The effectiveness of the proposed methodology is justified using a benchmark dataset of heart disease.


Author(s):  
Tsehay Admassu Assegie*

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


Author(s):  
Tsehay Admassu Assegie ◽  

Phishing causes many problems in business industry. The electronic commerce and electronic banking such as mobile banking involves a number of online transaction. In such online transactions, we have to discriminate features related to legitimate and phishing websites in order to ensure security of the online transaction. In this study, we have collected data form phish tank public data repository and proposed K-Nearest Neighbors (KNN) based model for phishing attack detection. The proposed model detects phishing attack through URL classification. The performance of the proposed model is tested empirically and result is analyzed. Experimental result on test set reveals that the model is efficient on phishing attack detection. Furthermore, the K value that gives better accuracy is determined to achieve better performance on phishing attack detection. Overall, the average accuracy of the proposed model is 85.08%.


2020 ◽  
Vol 4 (2) ◽  
pp. 39-47
Author(s):  
Junta Zeniarja ◽  
Anisatawalanita Ukhifahdhina ◽  
Abu Salam

Heart is one of the essential organs that assume a significant part in the human body. However, heart can also cause diseases that affect the death. World Health Organization (WHO) data from 2012 showed that all deaths from cardiovascular disease (vascular) 7.4 million (42.3%) were caused by heart disease. Increased cases of heart disease require a step as an early prevention and prevention efforts by making early diagnosis of heart disease. In this research will be done early diagnosis of heart disease by using data mining process in the form of classification. The algorithm used is K-Nearest Neighbor algorithm with Forward Selection method. The K-Nearest Neighbor algorithm is used for classification in order to obtain a decision result from the diagnosis of heart disease, while the forward selection is used as a feature selection whose purpose is to increase the accuracy value. Forward selection works by removing some attributes that are irrelevant to the classification process. In this research the result of accuracy of heart disease diagnosis with K-Nearest Neighbor algorithm is 73,44%, while result of K-Nearest Neighbor algorithm accuracy with feature selection method 78,66%. It is clear that the incorporation of the K-Nearest Neighbor algorithm with the forward selection method has improved the accuracy result. Keywords - K-Nearest Neighbor, Classification, Heart Disease, Forward Selection, Data Mining


2021 ◽  
Author(s):  
Ben Rahman ◽  
Harco Leslie Hendric Spits Warnars ◽  
Boy Subirosa Sabarguna ◽  
Widodo Budiharto

2020 ◽  
Vol 22 (10) ◽  
pp. 705-715 ◽  
Author(s):  
Min Liu ◽  
Guangzhong Liu

Background: Citrullination, an important post-translational modification of proteins, alters the molecular weight and electrostatic charge of the protein side chains. Citrulline, in protein sequences, is catalyzed by a class of Peptidyl Arginine Deiminases (PADs). Dependent on Ca2+, PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in many biological and pathological processes. Among them, abnormal protein citrullination modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis. Objective: It is important to identify the citrullination sites in protein sequences. The accurate identification of citrullination sites may contribute to the studies on the molecular functions and pathological mechanisms of related diseases. Methods and Results: In this study, after an encoded training set (containing 116 positive and 348 negative samples) into the feature matrix, the mRMR method was used to analyze the 941- dimensional features which were sorted on the basis of their importance. Then, a predictive model based on a self-normalizing neural network (SNN) was proposed to predict the citrullination sites in protein sequences. Incremental Feature Selection (IFS) and 10-fold cross-validation were used as the model evaluation method. Three classical machine learning models, namely random forest, support vector machine, and k-nearest neighbor algorithm, were selected and compared with the SNN prediction model using the same evaluation methods. SNN may be the best tool for citrullination site prediction. The maximum value of the Matthews Correlation Coefficient (MCC) reached 0.672404 on the basis of the optimal classifier of SNN. Conclusion: The results showed that the SNN-based prediction methods performed better when evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction model also achieved a better balance in the classification and recognition of positive and negative samples from datasets compared with the other three models.


2021 ◽  
Vol 5 (1) ◽  
pp. 208
Author(s):  
Pandito Dewa Putra ◽  
Sukemi Sukemi ◽  
Dian Palupi Rini

Heart disease has ranked as the leading cause of death in the world, accounting for around 17.3 million deaths per year with some causes, as high blood pressure, diabetes, cholesterol fluctuation, fatigue, and some others which is collected on dataset. Heart disease dataset that was applied is cleveland heart disease with fourteen (14) data atribute. The method that was applied in this research was using Backpropagation algorithm on heart disease classifying, where will be combined Artificial Bee Colony and k-Nearest Neighbor algorithm for features or atribute choose due to this technique can increase classifier model accuracy which is produced as much as 94,23%.


Sign in / Sign up

Export Citation Format

Share Document