Usage of KNN, Decision Tree and Random Forest Algorithms in Machine Learning and Performance Analysis with a Comparative Measure

Author(s):  
K. Uma Pavan Kumar ◽  
Ongole Gandhi ◽  
M. Venkata Reddy ◽  
S. V. N. Srinivasu
2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Peter Appiahene ◽  
Yaw Marfo Missah ◽  
Ussiph Najim

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.


2020 ◽  
Vol 4 (Supplement_1) ◽  
pp. 268-269
Author(s):  
Jaime Speiser ◽  
Kathryn Callahan ◽  
Jason Fanning ◽  
Thomas Gill ◽  
Anne Newman ◽  
...  

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Faizan Ullah ◽  
Qaisar Javaid ◽  
Abdu Salam ◽  
Masood Ahmad ◽  
Nadeem Sarwar ◽  
...  

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.


2020 ◽  
Vol 0 (0) ◽  
pp. 0-0
Author(s):  
D-K. Kim ◽  
H-S. Lim ◽  
K.M. Eun ◽  
Y. Seo ◽  
J.K. Kim ◽  
...  

BACKGROUND: Neutrophils present as major inflammatory cells in refractory chronic rhinosinusitis with nasal polyps (CRSwNP), regardless of the endotype. However, their role in the pathophysiology of CRSwNP remains poorly understood. We investigated factors predicting the surgical outcomes of CRSwNP patients with focus on neutrophilic localization. METHODS: We employed machine-learning methods such as the decision tree and random forest models to predict the surgical outcomes of CRSwNP. Immunofluorescence analysis was conducted to detect human neutrophil elastase (HNE), Bcl-2, and Ki-67 in NP tissues. We counted the immunofluorescence-positive cells and divided them into three groups based on the infiltrated area, namely, epithelial, subepithelial, and perivascular groups. RESULTS: On machine learning, the decision tree algorithm demonstrated that the number of subepithelial HNE-positive cells, Lund-Mackay (LM) scores, and endotype (eosinophilic or non-eosinophilic) were the most important predictors of surgical outcomes in CRSwNP patients. Additionally, the random forest algorithm showed that, after ranking the mean decrease in the Gini index or the accuracy of each factor, the top three ranking factors associated with surgical outcomes were the LM score, age, and number of subepithelial HNE-positive cells. In terms of cellular proliferation, immunofluorescence analysis revealed that Ki-67/HNE-double positive and Bcl-2/HNE-double positive cells were significantly increased in the subepithelial area in refractory CRSwNP. CONCLUSION: Our machine-learning approach and immunofluorescence analysis demonstrated that subepithelial neutrophils in NP tissues had a high expression of Ki-67 and could serve as a cellular biomarker for predicting surgical outcomes in CRSwNP patients.


Processes ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 26
Author(s):  
Francois Mbonyinshuti ◽  
Joseph Nkurunziza ◽  
Japhet Niyobuhungiro ◽  
Egide Kayitare

Today’s global business trends are causing a significant and complex data revolution in the healthcare industry, culminating in the use of artificial intelligence and predictive modeling to improve health outcomes and performance. The dataset, which was referred to is based on consumption data from 2015 to 2019, included approximately 500 goods. Based on a series of data pre-processing activities, the top ten (10) essential medicines most used were chosen, namely cotrimoxazole 480 mg, amoxicillin 250 mg, paracetamol 500 mg, oral rehydration salts (O.R.S) sachet 20.5 g, chlorpheniramine 4 mg, nevirapine 200 mg, aminophylline 100 mg, artemether 20 mg + lumefantrine (AL) 120 mg, Cromoglycate ophthalmic. Our study concentrated on the application of machine learning (ML) to forecast future trends in the demand for essential drugs in Rwanda. The following models were created and applied: linear regression, artificial neural network, and random forest. The random forest was able to predict 10 selected medicines with an accuracy of 88 percent with the train set and 76 percent with the test set, and it can thus be used to forecast future demand based on past consumption data by inputting a month, year, district, and medicine name. According to our findings, the random Forest model performed well as a forecasting model for the demand for essential medicines. Finally, data-driven predictive modeling with machine learning (ML) could become the cornerstone of health supply chain planning and operational management.


Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm


2021 ◽  
Vol 11 ◽  
Author(s):  
Yanjie Zhao ◽  
Rong Chen ◽  
Ting Zhang ◽  
Chaoyue Chen ◽  
Muhetaer Muhelisa ◽  
...  

BackgroundDifferential diagnosis between benign and malignant breast lesions is of crucial importance relating to follow-up treatment. Recent development in texture analysis and machine learning may lead to a new solution to this problem.MethodThis current study enrolled a total number of 265 patients (benign breast lesions:malignant breast lesions = 71:194) diagnosed in our hospital and received magnetic resonance imaging between January 2014 and August 2017. Patients were randomly divided into the training group and validation group (4:1), and two radiologists extracted their texture features from the contrast-enhanced T1-weighted images. We performed five different feature selection methods including Distance correlation, Gradient Boosting Decision Tree (GBDT), least absolute shrinkage and selection operator (LASSO), random forest (RF), eXtreme gradient boosting (Xgboost) and five independent classification models were built based on Linear discriminant analysis (LDA) algorithm.ResultsAll five models showed promising results to discriminate malignant breast lesions from benign breast lesions, and the areas under the curve (AUCs) of receiver operating characteristic (ROC) were all above 0.830 in both training and validation groups. The model with a better discriminating ability was the combination of LDA + gradient boosting decision tree (GBDT). The sensitivity, specificity, AUC, and accuracy in the training group were 0.814, 0.883, 0.922, and 0.868, respectively; LDA + random forest (RF) also suggests promising results with the AUC of 0.906 in the training group.ConclusionThe evidence of this study, while preliminary, suggested that a combination of MRI texture analysis and LDA algorithm could discriminate benign breast lesions from malignant breast lesions. Further multicenter researches in this field would be of great help in the validation of the result.


Sign in / Sign up

Export Citation Format

Share Document