scholarly journals Predicting Diabetes Disease using Random Forest Tree (Rft) Data Mining Technique

Diabetes is a condition that happens when the blood glucose is too high, also known as blood sugar. The primary source of energy is blood sugar, and it comes from the food you eat. Insulin, a pancreatic hormone, helps food glucose get into the cells for energy use. It also leads for an unrelated condition named, "Diabetes Insipidus”, which entails complications with the processing of fluids in the kidney. Insulin is the key to the ability of the cell to use glucose. Problems with the processing of insulin or how cells perceive insulin can easily cause out of control the body's carefully balanced glucose metabolism process [1]. Diabetes emerges when either of these conditions happens, blood sugar levels rise and crash and the risk of organ damage. Earlier prediction of this diabetes condition could provide proper treatment to protect the people from un avoided illness. For this prediction we can apply data mining which is used predominantly in healthcare organizations for decision making, disease detection purpose. In this paper data have been collected from UCI repositories and the data mining tool (WEKA) is used to predict diabetes. In this database there are 768 instances in which 500 instances belongs to tested negative and 268 instances belongs to tested positive. An experimental study is carried out using data mining technique classification technique called Random Forest Tree (RFT) classifier to predict diabetes. In this research, we have used different cross fold validation to achieve better accuracy and we found that cross fold validation k= 8 gives high accuracy 76.69% while compared with other cross fold validation values.

Author(s):  
Winner Walecha and Dr. Bhoomi Gupta

This paper presents a salary prediction system using the job listings from an employment website, in this case Glassdoor.com. A data mining technique is used to generate a model which will scrape number of jobs from the employment website, clean it on the basis of number of factors including the rival companies, revenue and skill required thereby predicting the salary to be expected when applying for a data science job. Techniques like linear regression, lasso regression, random forest regressors are optimised using GridsearchCV to reach the best model. The model can be further extended to build a flask API thus can be deployed on the internet for public usage.


2021 ◽  
Vol 2 (1) ◽  
pp. 33-44
Author(s):  
Sinta Septi Pangastuti ◽  
Kartika Fithriasari ◽  
Nur Iriawan ◽  
Wahyuni Suryaningtyas

data mining techniques in education sector have begun to evolve, along with the development of technology and the amount of data that can be stored in an education database storage system. One of them is a database of Bidikmisi scholarships in Indonesia. The Bidikmisi data used in this study will be classified using classification data mining technique. The technique that used in this study is random forest in combination with boosting algorithm and bagging algorithms. These algorithms also combine with SMOTE algorithm to handling the imbalance class in dataset. Based on the performance criteria G-mean and AUC, the algorithm combines with SMOTE tended to be better. The classification accuracy of each method being more than 90%


Diabetes is the disease which is growing now a days in human body and there are a number of patient who are suffering by this diabetes in the world. The data related to medical area is very huge which is related to the many disease. So the first thing is that we have to choose a mining tool which give best result for the given databases. Because, this medical data is statistical and most of the researchers using this type of data. Data mining tool is used for the extracting better result in accuracy for the diabetes data base. By the data mining techniques the medical expert and researchers analyze the result and provide the best treatment for this disease. In this paper we are using diabetes data and apply it on the Rattle, an open source tool of data mining and perform two classification methods decision tree and random forest tree for classify the data and show that which classification algorithm is best for diabetes datase


Author(s):  
Hana Rashied Esmaeel

<p>The paper attempts to apply data mining Technique, Five classification algorithms were used to build data they are (ZeroR, SMO, Naive Bayesian, J48 and Random Forest).The analysis implemented using WEKA (3.8.2) Data mining software tool. The information was collected from college of Information Engineering (COIE) In Al Nahrain University within the variety of form using "Referendum" to estimate the teacher performance; it was store in Excel file CSV format then regenerate to ARFF (Attribute Relation File Format). Many criteria like (Time taken to create models, accuracy and average error) was taken to evaluate the algorithms Random forest and , SMO Predicts higher than alternative algorithms ,since  their  accuracy is the highest and have lowest average error compared to others  ,"The teacher clarification and  wanting to be useful  to students " was the strongest attribute. Further removing the bad ranked attributes (10, 11, 12, and 14) that have a lower contact on dataset can increase accuracies of algorithms</p>


Sign in / Sign up

Export Citation Format

Share Document