C4.5 Decision Tree Machine Learning Algorithm Based GIS Route Identification

Author(s):  
Pankaj Kumar Dalela ◽  
Prashant Bansal ◽  
Arun Yadav ◽  
Sabyasachi Majumdar ◽  
Anurag Yadav ◽  
...  
2021 ◽  
Vol 5 (2) ◽  
pp. 398
Author(s):  
Pramana Yoga Saputra ◽  
Moch Zawaruddin Abdullah ◽  
Annisa Puspa Kirana

Imbalance data is a condition which there is a distinction in the quantity of data that results withinside the majority class (classes with very many members) and minority class (classes with very few members). It can complicate the classification process since the machine learning algorithm method is designed to classify already balanced data. The oversampling process technique is used to resolve data imbalance by applying synthetic data to the minority class in such a manner that it has the same volume of data as the majority class. MWMOTE is an oversampling technique that generates synthetic data based on members of the minority class clusters that are close to the majority class. This approach is capable of generating synthetic data well. The resulting synthesis data remains in the nearby majority region and too dense on the border of the cluster. It is hence permitting the resulting synthetic data to go into the majority class classification. This study is objectives to improve the process of generating synthetic data on MWMOTE so that the resulting data is extensively dispensed withinside the minority class. The outcomes of the test show that the proposed method is capable of enhancing the classification performance for KNN and C4.5 Decision Tree classification sequentially by 0.46% and 0.96% compared to MWMOTE


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Peter Appiahene ◽  
Yaw Marfo Missah ◽  
Ussiph Najim

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.


2021 ◽  
Vol 8 ◽  
Author(s):  
Anthime Flaus ◽  
Julie Amat ◽  
Nathalie Prevot ◽  
Louis Olagne ◽  
Lucie Descamps ◽  
...  

Introduction: The aim of this study was to find the best ordered combination of two FDG positive musculoskeletal sites with a machine learning algorithm to diagnose polymyalgia rheumatica (PMR) vs. other rheumatisms in a cohort of patients with inflammatory rheumatisms.Methods: This retrospective study included 140 patients who underwent [18F]FDG PET-CT and whose final diagnosis was inflammatory rheumatism. The cohort was randomized, stratified on the final diagnosis into a training and a validation cohort. FDG uptake of 17 musculoskeletal sites was evaluated visually and set positive if uptake was at least equal to that of the liver. A decision tree classifier was trained and validated to find the best combination of two positives sites to diagnose PMR. Diagnosis performances were measured first, for each musculoskeletal site, secondly for combination of two positive sites and thirdly using the decision tree created with machine learning.Results: 55 patients with PMR and 85 patients with other inflammatory rheumatisms were included. Musculoskeletal sites, used either individually or in combination of two, were highly imbalanced to diagnose PMR with a high specificity and a low sensitivity. The machine learning algorithm identified an optimal ordered combination of two sites to diagnose PMR. This required a positive interspinous bursa or, if negative, a positive trochanteric bursa. Following the decision tree, sensitivity and specificity to diagnose PMR were respectively 73.2 and 87.5% in the training cohort and 78.6 and 80.1% in the validation cohort.Conclusion: Ordered combination of two visually positive sites leads to PMR diagnosis with an accurate sensitivity and specificity vs. other rheumatisms in a large cohort of patients with inflammatory rheumatisms.


As a wrongdoing of utilizing specialized intends to take sensitive data of clients and users in the internet, phishing is as of now an advanced risk confronting the Internet, and misfortunes due to phishing are developing consistently. Recognition of these phishing scams is a very testing issue on the grounds that phishing is predominantly a semantics based assault, which particularly manhandles human vulnerabilities, anyway not system or framework vulnerabilities. Phishing costs. As a product discovery plot, two primary methodologies are generally utilized: blacklists/whitelists and machine learning approaches. Every phishing technique has different parameters and type of attack. Using decision tree algorithm we find out whether the attack is legitimate or a scam. We measure this by grouping them with diverse parameters and features, thereby assisting the machine learning algorithm to edify.


Heart disease is a common problem which can be very severe in old ages and also in people not having a healthy lifestyle. With regular check-up and diagnosis in addition to maintaining a decent eating habit can prevent it to some extent. In this paper we have tried to implement the most sought after and important machine learning algorithm to predict the heart disease in a patient. The decision tree classifier is implemented based on the symptoms which are specifically the attributes required for the purpose of prediction. Using the decision tree algorithm, we will be able to identify those attributes which are the best one that will lead us to a better prediction of the datasets. The decision tree algorithm works in a way where it tries to solve the problem by the help of tree representation. Here each internal node of the tree represents an attribute, and each leaf node corresponds to a class label. The support vector machine algorithm helps us to classify the datasets on the basis of kernel and it also groups the dataset using hyperplane. The main objective of this project is to try and reduce the number of occurrences of the heart diseases in patients


2020 ◽  
Author(s):  
Juan Chen ◽  
Yong-ran Cheng ◽  
Zhan-hui Feng ◽  
Meng-Yun Zhou ◽  
Nan Wang ◽  
...  

Abstract Background: Accurate prediction of the number of patients with conjunctivitis plays an important role in providing adequate treatment at the hospital, but such accurate predictive model currently does not exist. The current study sought to use machine learning (ML) prediction based on past patient for conjunctivitis and several air pollutants. The optimal machine learning prediction model was selected to predict conjunctivitis-related number patients.Methods: The average daily air pollutants concentrations (CO, O3, NO2, SO2, PM10, PM2.5) and weather data (highest and lowest temperature) were collected. Data were randomly divided into training dataset and test dataset, and normalized mean square error (NMSE) was calculated by 10 fold cross validation, comparing between the ability of seven ML methods to predict the number of patient due to conjunctivitis (Lasso penalized liner model, Decision tree, Boosting regression, Bagging regression, Random forest, Support vector, and Neural network). According to the accuracy of impact prediction, the important air and weather factors that affect conjunctivitis were identified.Results: A total of 84977 cases to treat conjunctivitis were obtained from the ophthalmology center of the Affiliated Hospital of Hangzhou Normal University. For all patients together, the NMSE of the different methods were as follows: Lasso penalized liner regression: 0.755, Decision tree: 0.710, Boosting regression: 0.616, Bagging regression: 0.615, Random forest: 0.392, Support vectors: 0.688, and Neural network: 0.476. Further analyses, stratified by gender and age at diagnosis, supported Random forest as being superior to others ML methods. The main factors affecting conjunctivitis were: O3, NO2, SO2 and air temperature.Conclusion: Machine learning algorithm can predict number of patients due to conjunctivitis, among which, the Random forest algorithm had the highest accuracy. Machine learning algorithm could provide accurate information for hospitals dealing with conjunctivitis caused by air factors.


2021 ◽  
Vol 9 (1) ◽  
Author(s):  
Maad M. Mijwil ◽  
Rana A. Abttan

A decision tree (DTs) is one of the most popular machine learning algorithms that divide data repeatedly to form groups or classes. It is a supervised learning algorithm that can be used on discrete or continuous data for classification or regression. The most traditional classifier in this algorithm is the C4.5 decision tree, which is the point of this research. This classifier has the advantage of building a vast data set and does not stop until it reaches the desired goal. The problem with this classifier is that there are unnecessary nodes and branches leading to overfitting. This overfitting can negatively affect the classification process. In this context, the authors suggest utilizing a genetic algorithm to prune the effect of overfitting. This dataset study consists of four datasets: IRIS, Car Evaluation, GLASS, and WINE collected from UC Irvine (UCI) machine learning repository. The experimental results have confirmed the effectiveness of the genetic algorithm in pruning the effect of overfitting on the four datasets and optimizing confidence factor (CF) of the C4.5 decision tree. The proposed method has reached about 92% accuracy in this work.


Sign in / Sign up

Export Citation Format

Share Document