Prognosticate and diagnosis of diabetes using data preprocessing and null value removal on the modified data set with possible outcome of a decision tree construction through R- programming

Author(s):  
K. Kanmani ◽  
Dr. A. Murugan

Data mining is better choices in emerging research filed- soil data analysis. crop yield prediction is an important issue for selecting the crop. earlier prediction of crop is done by the experience of farmer on a particular type of field and crop. predicting the crop is done by the farmer’s experience based on the factors like soil types, climatic condition, seasons, and weather, rainfall and irrigation facilities. data mining techniques is the better choice for predicting the crop. the analysis of soil plays an important role in agricultural filed. soil fertility prediction is one of the very important factors in agriculture this research work implements to predict yield of crop, decision tree algorithm is used to find yield. the aim of this research to pinpoint the accuracy and to finding the yield of the crop using decision tree and c 4.5 algorithm is used to predict the yield of crop using rprogramming and also to find range of magnesium found in the collected soil data set. this prediction will be very useful for the farmer to predict the crop yield for cultivation


Author(s):  
Saadia Karim

<p>The purpose of this paper is to analyze the climate changes in Pakistan, identify issues related to weather disasters and to revisit weather prediction approaches. The proposed approach is based on different algorithms and their comparisons with reference to past 5years (2010 - 2015) data on 12 attributes. A flow diagram is given that identifies steps included in the process. Results are obtained using WEKA 3.7.13 (latest version 2015). The KNN algorithm and memory-based reasoning algorithm shows the accuracy of predicting weather forecasts. The BPANN algorithm is used to analyze the data set along with KNN and memory-based reasoning algorithms. Decision tree shows the accuracy of predicting weather forecasts. The KNN is used with Bayesian approach in this research. Attributes used in this research shows significant relationship while many of those work as independent variables. Since, for weather prediction these attributes are very important, we used variant factors based on time and date. The KNN algorithm using Bayesian classifier provides accurate results compare with memory-based reasoning of Decision Tree and BPANN trainlm and trainbr.</p>


Author(s):  
Umar Sidiq ◽  
Syed Mutahar Aaqib ◽  
Rafi Ahmad Khan

Classification is one of the most considerable supervised learning data mining technique used to classify predefined data sets the classification is mainly used in healthcare sectors for making decisions, diagnosis system and giving better treatment to the patients. In this work, the data set used is taken from one of recognized lab of Kashmir. The entire research work is to be carried out with ANACONDA3-5.2.0 an open source platform under Windows 10 environment. An experimental study is to be carried out using classification techniques such as k nearest neighbors, Support vector machine, Decision tree and Naïve bayes. The Decision Tree obtained highest accuracy of 98.89% over other classification techniques.


2010 ◽  
Vol 6 (3) ◽  
pp. 28-42 ◽  
Author(s):  
Bijan Raahemi ◽  
Ali Mumtaz

This paper presents a new approach using data mining techniques, and in particular a two-stage architecture, for classification of Peer-to-Peer (P2P) traffic in IP networks where in the first stage the traffic is filtered using standard port numbers and layer 4 port matching to label well-known P2P and NonP2P traffic. The labeled traffic produced in the first stage is used to train a Fast Decision Tree (FDT) classifier with high accuracy. The Unknown traffic is then applied to the FDT model which classifies the traffic into P2P and NonP2P with high accuracy. The two-stage architecture not only classifies well-known P2P applications, but also classifies applications that use random or non-standard port numbers and cannot be classified otherwise. The authors captured the internet traffic at a gateway router, performed pre-processing on the data, selected the most significant attributes, and prepared a training data set to which the new algorithm was applied. Finally, the authors built several models using a combination of various attribute sets for different ratios of P2P to NonP2P traffic in the training data.


Author(s):  
Zeye Liu ◽  
◽  
Xiangbin Pan ◽  

Objective: To analyze the performance of each algorithm model under different processing conditions such as data preprocessing (standardization, normalization and regularization), balancing and shuffling based on the data attributes of three common research types in clinical studies as the research examples. To compare and analyze advantages and disadvantages of the decision tree model and the neural network model in clinical studies as well as their scope of application. Methods: Python was used to construct ID3 and CART decision tree models. Three typical clinical research data sets were downloaded from UCI and used to perform data preprocessing, balancing, and shuffling on the models. The model evaluation indexes included time complexity, accuracy, precision, recall and F1-Score. As for visualization, the model results, confusion matrix and ROC curve were drawn. The importance rankings of different data set attributes on the model results were also analyzed. In addition, one typical data set was selected to conduct the comparative analysis by using the neural network model. SPSS was used to perform the significance analysis of different data processing schemes. The SPSS platform was used to conduct the statistical test of the results. Results: (1) There were a total of 96 decision trees based on 2 decision tree algorithms, 3 data sets, 4 types of data preprocessing, 2 balanced choices and 2 shuffling choices. (2) The AUC value of the Thoracic Surgery Data Set significantly increased after balancing with a maximum increase of 0.3, which was statistically significant (P <0.01). (3) The AUC value of the Breast Cancer Wisconsin (Diagnostic) Data Set generally increased after normalization, which decreased after regularization. The maximum decrease was 0.6 without statistical significance (P = 0.3). (4) The AUC value of the Statlog (Heart) Data Set increased after regularization but it was not statistically significant. The maximum increase was 0.03. (5) Data balancing and shuffling can increase the AUC value. (6) The performance of the neural network model was between the best and worst performance of the decision tree model.


2019 ◽  
Vol 23 (6) ◽  
pp. 670-679
Author(s):  
Krista Greenan ◽  
Sandra L. Taylor ◽  
Daniel Fulkerson ◽  
Kiarash Shahlaie ◽  
Clayton Gerndt ◽  
...  

OBJECTIVEA recent retrospective study of severe traumatic brain injury (TBI) in pediatric patients showed similar outcomes in those with a Glasgow Coma Scale (GCS) score of 3 and those with a score of 4 and reported a favorable long-term outcome in 11.9% of patients. Using decision tree analysis, authors of that study provided criteria to identify patients with a potentially favorable outcome. The authors of the present study sought to validate the previously described decision tree and further inform understanding of the outcomes of children with a GCS score 3 or 4 by using data from multiple institutions and machine learning methods to identify important predictors of outcome.METHODSClinical, radiographic, and outcome data on pediatric TBI patients (age < 18 years) were prospectively collected as part of an institutional TBI registry. Patients with a GCS score of 3 or 4 were selected, and the previously published prediction model was evaluated using this data set. Next, a combined data set that included data from two institutions was used to create a new, more statistically robust model using binomial recursive partitioning to create a decision tree.RESULTSForty-five patients from the institutional TBI registry were included in the present study, as were 67 patients from the previously published data set, for a total of 112 patients in the combined analysis. The previously published prediction model for survival was externally validated and performed only modestly (AUC 0.68, 95% CI 0.47, 0.89). In the combined data set, pupillary response and age were the only predictors retained in the decision tree. Ninety-six percent of patients with bilaterally nonreactive pupils had a poor outcome. If the pupillary response was normal in at least one eye, the outcome subsequently depended on age: 72% of children between 5 months and 6 years old had a favorable outcome, whereas 100% of children younger than 5 months old and 77% of those older than 6 years had poor outcomes. The overall accuracy of the combined prediction model was 90.2% with a sensitivity of 68.4% and specificity of 93.6%.CONCLUSIONSA previously published survival model for severe TBI in children with a low GCS score was externally validated. With a larger data set, however, a simplified and more robust model was developed, and the variables most predictive of outcome were age and pupillary response.


Author(s):  
Dhilsath Fathima.M ◽  
S. Justin Samuel ◽  
R. Hari Haran

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.


2009 ◽  
Vol 147-149 ◽  
pp. 588-593 ◽  
Author(s):  
Marcin Derlatka ◽  
Jolanta Pauk

In the paper the procedure of processing biomechanical data has been proposed. It consists of selecting proper noiseless data, preprocessing data by means of model’s identification and Kernel Principal Component Analysis and next classification using decision tree. The obtained results of classification into groups (normal and two selected pathology of gait: Spina Bifida and Cerebral Palsy) were very good.


2011 ◽  
Vol 243-249 ◽  
pp. 6292-6295 ◽  
Author(s):  
Rong Yau Huang ◽  
Li Hsu Yeh ◽  
Hao Hsien Chen ◽  
Jyh Dong Lin ◽  
Ping Fu Chen ◽  
...  

This study examines construction waste generation and management in Taiwan. We verify the factors probable affecting the output of construction wastes by using data for the output of declared construction wastes produced from demolition projects in Taiwan in the last year, expert interviews, and research achievements in the past, and find “ on-site separation” is the factor with effects on the output of construction wastes via cross-correlation by algorithms such as K-Means and Decision Tree C5.0. It can be seen that the output (0.092(t/M3) with on-site separation or 0.329(t/M3) without on-site separation is highly correlated with the composition ratio of construction wastes and referred to as a valid conclusion.


Sign in / Sign up

Export Citation Format

Share Document