Chronic Kidney Disease Prediction using Machine Learning Algorithms

Kidney diseases are increasing day by day among people. It is becoming a major health issue around the world. Not maintaining proper food habits and drinking less amount of water are one of the major reasons that contribute this condition. With this, it has become necessary to build up a system to foresee Chronic Kidney Diseases precisely. Here, we have proposed an approach for real time kidney disease prediction. Our aim is to find the best and efficient machine learning (ML) application that can effectively recognize and predict the condition of chronic kidney disease. We have used the data from UCI machine learning repository. In this work, five important machine learning classification techniques were considered for predicting chronic kidney disease which are KNN, Logistic Regression, Random Forest Classifier, SVM and Decision Tree Classifier. In this process, the data has been divided into two sections. In one section train dataset got trained and another section got evaluated by test dataset. The analysis results show that Decision Tree Classifier and Logistic Regression algorithms achieved highest performance than the other classifiers, obtaining the accuracy of 98.75% followed by random Forest, which stands at 97.5%.

Download Full-text

Chronic Kidney Disease Prediction using Machine Learning Algorithms

International Journal of Preventive Medicine and Health ◽

10.35940/ijpmh.c1010.071321 ◽

2021 ◽

Vol 1 (3) ◽

pp. 1-4

Author(s):

Kallu Samatha ◽

Muppidi Rohitha Reddy ◽

Pattan Faizal Khan ◽

Rayapati Akhil Chowdary ◽

P.V.R.D Prasada Rao

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Logistic Regression ◽

Random Forest ◽

Kidney Disease ◽

Decision Tree ◽

Kidney Diseases ◽

Disease Prediction ◽

Decision Tree Classifier ◽

Tree Classifier

Download Full-text

Swindling Shonky Anatomization of Credit Card Transactions using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d7621.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 1477-1483

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Decision Tree ◽

Credit Card ◽

Naive Bayes ◽

Gradient Boosting ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Feature Importance

With the fast moving technological advancement, the internet usage has been increased rapidly in all the fields. The money transactions for all the applications like online shopping, banking transactions, bill settlement in any industries, online ticket booking for travel and hotels, Fees payment for educational organization, Payment for treatment to hospitals, Payment for super market and variety of applications are using online credit card transactions. This leads to the fraud usage of other accounts and transaction that result in the loss of service and profit to the institution. With this background, this paper focuses on predicting the fraudulent credit card transaction. The Credit Card Transaction dataset from KAGGLE machine learning Repository is used for prediction analysis. The analysis of fraudulent credit card transaction is achieved in four ways. Firstly, the relationship between the variables of the dataset is identified and represented by the graphical notations. Secondly, the feature importance of the dataset is identified using Random Forest, Ada boost, Logistic Regression, Decision Tree, Extra Tree, Gradient Boosting and Naive Bayes classifiers. Thirdly, the extracted feature importance if the credit card transaction dataset is fitted to Random Forest classifier, Ada boost classifier, Logistic Regression classifier, Decision Tree classifier, Extra Tree classifier, Gradient Boosting classifier and Naive Bayes classifier. Fourth, the Performance Analysis is done by analyzing the performance metrics like Accuracy, FScore, AUC Score, Precision and Recall. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Results shows that the Decision Tree classifier have achieved the effective prediction with the precision of 1.0, recall of 1.0, FScore of 1.0 , AUC Score of 89.09 and Accuracy of 99.92%.

Download Full-text

Comparison of Machine Learning With Logistic Regression for Prediction of Chronic Kidney Disease in the Thai Adult Population

Ramathibodi Medical Journal ◽

10.33165/rmj.2021.44.4.250334 ◽

2021 ◽

Vol 44 (4) ◽

pp. 1-12

Author(s):

Ratchainant Thammasudjarit ◽

Punnathorn Ingsathit ◽

Sigit Ari Saputro ◽

Atiporn Ingsathit ◽

Ammarin Thakkinstian

Keyword(s):

Neural Network ◽

Machine Learning ◽

Chronic Kidney Disease ◽

Logistic Regression ◽

Random Forest ◽

Kidney Disease ◽

Decision Tree ◽

Prediction Model ◽

Prediction Models ◽

The Neural Network

Background: Chronic kidney disease (CKD) takes huge amounts of resources for treatments. Early detection of patients by risk prediction model should be useful in identifying risk patients and providing early treatments. Objective: To compare the performance of traditional logistic regression with machine learning (ML) in predicting the risk of CKD in Thai population. Methods: This study used Thai Screening and Early Evaluation of Kidney Disease (SEEK) data. Seventeen features were firstly considered in constructing prediction models using logistic regression and 4 MLs (Random Forest, Naïve Bayes, Decision Tree, and Neural Network). Data were split into train and test data with a ratio of 70:30. Performances of the model were assessed by estimating recall, C statistics, accuracy, F1, and precision. Results: Seven out of 17 features were included in the prediction models. A logistic regression model could well discriminate CKD from non-CKD patients with the C statistics of 0.79 and 0.78 in the train and test data. The Neural Network performed best among ML followed by a Random Forest, Naïve Bayes, and a Decision Tree with the corresponding C statistics of 0.82, 0.80, 0.78, and 0.77 in training data set. Performance of these corresponding models in testing data decreased about 5%, 3%, 1%, and 2% relative to the logistic model by 2%. Conclusions: Risk prediction model of CKD constructed by the logit equation may yield better discrimination and lower tendency to get overfitting relative to ML models including the Neural Network and Random Forest.

Download Full-text

Machine Learning Framework to Predict Chronic Kidney Disease using Ensemble Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d9107.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1-6

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Decision Tree ◽

Performance Metrics ◽

Weighted Average ◽

Gradient Boosting ◽

Support Vector ◽

The Individual

Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm

Download Full-text

Automatic Classification of Hypertension Types Based on Personal Features by Machine Learning Algorithms

Mathematical Problems in Engineering ◽

10.1155/2020/2742781 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Majid Nour ◽

Kemal Polat

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Random Forest ◽

Decision Tree ◽

Systolic Blood Pressure ◽

Diastolic Blood Pressure ◽

Decision Tree Classifier ◽

Tree Classifier ◽

C4.5 Decision Tree

Hypertension (high blood pressure) is an important disease seen among the public, and early detection of hypertension is significant for early treatment. Hypertension is depicted as systolic blood pressure higher than 140 mmHg or diastolic blood pressure higher than 90 mmHg. In this paper, in order to detect the hypertension types based on the personal information and features, four machine learning (ML) methods including C4.5 decision tree classifier (DTC), random forest, linear discriminant analysis (LDA), and linear support vector machine (LSVM) have been used and then compared with each other. In the literature, we have first carried out the classification of hypertension types using classification algorithms based on personal data. To further explain the variability of the classifier type, four different classifier algorithms were selected for solving this problem. In the hypertension dataset, there are eight features including sex, age, height (cm), weight (kg), systolic blood pressure (mmHg), diastolic blood pressure (mmHg), heart rate (bpm), and BMI (kg/m2) to explain the hypertension status and then there are four classes comprising the normal (healthy), prehypertension, stage-1 hypertension, and stage-2 hypertension. In the classification of the hypertension dataset, the obtained classification accuracies are 99.5%, 99.5%, 96.3%, and 92.7% using the C4.5 decision tree classifier, random forest, LDA, and LSVM. The obtained results have shown that ML methods could be confidently used in the automatic determination of the hypertension types.

Download Full-text

Heart Disease Prediction Using Decision Tree and Random Forest Classification Techniques

Applications of Big Data in Large- and Small-Scale Systems - Advances in Data Mining and Database Management ◽

10.4018/978-1-7998-6673-2.ch015 ◽

2021 ◽

pp. 234-259

Author(s):

Nitika Kapoor ◽

Parminder Singh

Keyword(s):

Feature Extraction ◽

Heart Disease ◽

Random Forest ◽

Decision Tree ◽

Random Forest Classifier ◽

Disease Prediction ◽

Decision Tree Classifier ◽

Hybrid Classifier ◽

Forest Classification ◽

Tree Classifier

Data mining is the approach which can extract useful information from the data. The prediction analysis is the approach which can predict future possibilities based on the current information. The authors propose a hybrid classifier to carry out the heart disease prediction. The hybrid classifier is combination of random forest and decision tree classifier. Moreover, the heart disease prediction technique has three steps, which are data pre-processing, feature extraction, and classification. In this research, random forest classifier is applied for the feature extraction and decision tree classifier is applied for the generation of prediction results. However, random forest classifier will extract the information and decision tree will generate final classifier result. The authors show the results of proposed model using the Python platform. Moreover, the results are compared with support vector machine (SVM) and k-nearest neighbour classifier (KNN).

Download Full-text

Detecting Kidney Disease using Naïve Bayes and Decision Tree in Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4377.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 498-501

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Decision Tree ◽

Naive Bayes ◽

Kidney Diseases ◽

Naïve Bayes ◽

Bayes Classifier ◽

Machine Learning Classification ◽

The Relationship

Chronic Kidney Disease (CKD) mostly influence patients suffered from difficulties due to diabetes or high blood pressure and make them unable to carry out their daily activities. In a survey , it has been revealed that one in 12 persons living in two biggest cities of India diagnosed of CKD features that put them at high risk for unfavourable outcomes. In this article, we have analyzed as well as anticipated chronic kidney disease by discovering the hidden pattern of the relationship using feature selection and Machine Learning classification approach like naive Bayes classifier and decision tree(J48). The dataset on which these approaches are applied is taken from UC Irvine repository. Based on certain feature, the approaches will predict whether a person is diagnosed with a CKD or Not CKD. While performing comparative analysis, it has been observed that J48 decision tree gives high accuracy rate in prediction. J48 classifier proves to be efficient and more effective in detecting kidney diseases.

Download Full-text

Fake News Data Exploration and Analytics

Electronics ◽

10.3390/electronics10192326 ◽

2021 ◽

Vol 10 (19) ◽

pp. 2326

Author(s):

Mazhar Javed Awan ◽

Awais Yasin ◽

Haitham Nobanee ◽

Ahmed Abid Ali ◽

Zain Shahzad ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Random Forest Classifier ◽

The Internet ◽

Fake News ◽

Learning Models ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Machine Learning Models

Before the internet, people acquired their news from the radio, television, and newspapers. With the internet, the news moved online, and suddenly, anyone could post information on websites such as Facebook and Twitter. The spread of fake news has also increased with social media. It has become one of the most significant issues of this century. People use the method of fake news to pollute the reputation of a well-reputed organization for their benefit. The most important reason for such a project is to frame a device to examine the language designs that describe fake and right news through machine learning. This paper proposes models of machine learning that can successfully detect fake news. These models identify which news is real or fake and specify the accuracy of said news, even in a complex environment. After data-preprocessing and exploration, we applied three machine learning models; random forest classifier, logistic regression, and term frequency-inverse document frequency (TF-IDF) vectorizer. The accuracy of the TFIDF vectorizer, logistic regression, random forest classifier, and decision tree classifier models was approximately 99.52%, 98.63%, 99.63%, and 99.68%, respectively. Machine learning models can be considered a great choice to find reality-based results and applied to other unstructured data for various sentiment analysis applications.

Download Full-text

BGP Anomaly Detection using Decision Tree Based Machine Learning Classifiers

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3622.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 4015-4020

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Principal Component ◽

Fine Tuning ◽

Decision Tree Classifier ◽

Data Packets ◽

Analysis Technique ◽

Tree Classifier ◽

Hardware Failures

Border Gateway Protocol (BGP) is utilized to send and receive data packets over the internet. Over the years, this protocol has suffered from some massive hits, caused by worms, such as Nimda, Slammer, Code Red etc., hardware failures, and/or prefix hijacking. This caused obstruction of services to many. However, Identification of anomalous messages traversing over BGP allows discovering of such attacks in time. In this paper, a Machine Learning approach has been applied to identify such BGP messages. Principal Component Analysis technique was applied for reducing dimensionality up to 2 components, followed by generation of Decision Tree, Random Forest, AdaBoost and GradientBoosting classifiers. On fine tuning the parameters, the random forest classifier generated an accuracy of 97.84%, the decision tree classifier followed closely with an accuracy of 97.38%. The GradientBoosting Classifier gave an accuracy of 95.41% and the AdaBoost Classifier gave an accuracy of 94.43%.

Download Full-text

Exploration of Neighbor Kernels and Feature Estimators for Heart Disease Prediction using Machine Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l3472.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 599-605

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Random Forest ◽

Random Forest Classifier ◽

Support Vector ◽

Classification Algorithms ◽

Disease Prediction ◽

Decision Tree Classifier ◽

Support Vector Classifier ◽

Tree Classifier

In the growing era of technological world, the people are suffered with various diseases. The common disease faced by the population irrespective of the age is the heart disease. Though the world is blooming in technological aspects, the prediction and the identification of the heart disease still remains a challenging issue. Due to the deficiency of the availability of patient symptoms, the prediction of heart disease is a disputed charge. With this overview, we have used Heart Disease Prediction dataset extorted from UCI Machine Learning Repository for the analysis and comparison of various parameters in the classification algorithms. The parameter analysis of various classification algorithms of heart disease classes are done in five ways. Firstly, the analysis of dataset is done by exploiting the correlation matrix, feature importance analysis, Target distribution of the dataset and Disease probability based on the density distribution of age and sex. Secondly, the dataset is fitted to K-Nearest Neighbor classifier to analyze the performance for the various combinations of neighbors with and without PCA. Thirdly, the dataset is fitted to Support Vector classifier to analyze the performance for the various combinations of kernels with and without PCA. Fourth, the dataset is fitted to Decision Tree classifier to analyze the performance for the various combinations of features with and without PCA. Fifth, the dataset is fitted to Random Forest classifier to analyze the performance for the various levels of estimators with and without PCA. The implementation is done using python language under Spyder platform with Anaconda Navigator. Experimental results shows that for KNN classifier, the performance for 12 neighbours is found to be effective with 0.52 before applying PCA and 0.53 after applying PCA. For Support Vector classifier, the rbf kernel is found to be effective with the score of 0.519 with and without PCA. For Decision Tree classifier, before applying PCA, the score is 0.47 for 7 features and after applying PCA, the score is 0.49 for 4 features. For, Random Forest Classifier, before applying PCA, the score is 0.53 for 500 estimators and after applying PCA, the score is 0.52 for 500 estimators.

Download Full-text