scholarly journals Penerapan Data Mining Untuk Memprediksi Potensi Pendonor Darah Menjadi Pendonor Tetap Menggunakan Metode Decision Tree C.45

S CIES ◽  
2018 ◽  
Vol 7 (2) ◽  
pp. 101-108
Author(s):  
Ketut Jaya Atmaja ◽  
Ida Bagus Gede Anandita ◽  
Ni Kadek Ceryna Dewi
Keyword(s):  

Persediaan darah pada suatu rumah sakit terkadang tidak menentu, sehingga sangat dibutuhkan persedian darah yang cukup pada bank darah, sehingga jika sewaktu – waktu dibutuhkan, darah sudah tersedia. Namun pada kenyataannya persedian darah sering kali bersifat tidak tetap. Dengan data mining diharapkan mampu memprediksi data pendonor yang ada pada PMI untuk memprediksi pendonor yang berpotensi menjadi pendonor tetap. Dalam proses ini metode yang dipakai dalam melakukan data mining adalah algoritma C4.5. Dari hasil analisis yang dilakukan, dapat diketahui bahwa data yang digunakan adalah data random sebanyak 600 data, dimana data training berjumlah 500 data, dan data set berjumlah 100 data. Dari pohon decision tree yang didapatkan dari proses data mining menggunakan algoritma C4.5 yang dilakukan dapat disimpulkan bahwa pegawai swasta dengan umur diatas 26 tahun paling banyak menjadi pendonor.

Author(s):  
Conrad S. Tucker ◽  
Harrison M. Kim

The formulation of a product portfolio requires extensive knowledge about the product market space and also the technical limitations of a company’s engineering design and manufacturing processes. A design methodology is presented that significantly enhances the product portfolio design process by eliminating the need for an exhaustive search of all possible product concepts. This is achieved through a decision tree data mining technique that generates a set of product concepts that are subsequently validated in the engineering design using multilevel optimization techniques. The final optimal product portfolio evaluates products based on the following three criteria: (1) it must satisfy customer price and performance expectations (based on the predictive model) defined here as the feasibility criterion; (2) the feasible set of products/variants validated at the engineering level must generate positive profit that we define as the optimality criterion; (3) the optimal set of products/variants should be a manageable size as defined by the enterprise decision makers and should therefore not exceed the product portfolio limit. The strength of our work is to reveal the tremendous savings in time and resources that exist when decision tree data mining techniques are incorporated into the product portfolio design and selection process. Using data mining tree generation techniques, a customer data set of 40,000 responses with 576 unique attribute combinations (entire set of possible product concepts) is narrowed down to 46 product concepts and then validated through the multilevel engineering design response of feasible products. A cell phone example is presented and an optimal product portfolio solution is achieved that maximizes company profit, without violating customer product performance expectations.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Win-Tsung Lo ◽  
Yue-Shan Chang ◽  
Ruey-Kai Sheu ◽  
Chun-Chieh Chiu ◽  
Shyan-Ming Yuan

Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.


Data mining is better choices in emerging research filed- soil data analysis. crop yield prediction is an important issue for selecting the crop. earlier prediction of crop is done by the experience of farmer on a particular type of field and crop. predicting the crop is done by the farmer’s experience based on the factors like soil types, climatic condition, seasons, and weather, rainfall and irrigation facilities. data mining techniques is the better choice for predicting the crop. the analysis of soil plays an important role in agricultural filed. soil fertility prediction is one of the very important factors in agriculture this research work implements to predict yield of crop, decision tree algorithm is used to find yield. the aim of this research to pinpoint the accuracy and to finding the yield of the crop using decision tree and c 4.5 algorithm is used to predict the yield of crop using rprogramming and also to find range of magnesium found in the collected soil data set. this prediction will be very useful for the farmer to predict the crop yield for cultivation


2020 ◽  
Vol 3 (1) ◽  
pp. 40-54
Author(s):  
Ikong Ifongki

Data mining is a series of processes to explore the added value of a data set in the form of knowledge that has not been known manually. The use of data mining techniques is expected to provide knowledge - knowledge that was previously hidden in the data warehouse, so that it becomes valuable information. C4.5 algorithm is a decision tree classification algorithm that is widely used because it has the main advantages of other algorithms. The advantages of the C4.5 algorithm can produce decision trees that are easily interpreted, have an acceptable level of accuracy, are efficient in handling discrete type attributes and can handle discrete and numeric type attributes. The output of the C4.5 algorithm is a decision tree like other classification techniques, a decision tree is a structure that can be used to divide a large data set into smaller sets of records by applying a series of decision rules, with each series of division members of the resulting set become similar to each other. In this case study what is discussed is the effect of coffee sales by processing 106 data from 1087 coffee sales data at PT. JPW Indonesia. Data samples taken will be calculated manually using Microsoft Excel and Rapidminer software. The results of the calculation of the C4.5 algorithm method show that the Quantity and Price attributes greatly affect coffee sales so that sales at PT. JPW Indonesia is still often unstable.


Author(s):  
T. Z. Ibragimov ◽  

methods of data mining were used to predict the Septoria leaf blotch of wheat. A system has been developed that allows parallel forecasting with the same data set using the methods of an artificial neural network, a decision tree, and a naive Bayesian classifier. The system allows you to interactively adjust the design parameters for each of the methods, see the results obtained and evaluate their effectiveness.


Author(s):  
Geert Wets ◽  
Koen Vanhoof ◽  
Theo Arentze ◽  
Harry Timmermans

The utility-maximizing framework—in particular, the logit model—is the dominantly used framework in transportation demand modeling. Computational process modeling has been introduced as an alternative approach to deal with the complexity of activity-based models of travel demand. Current rule-based systems, however, lack a methodology to derive rules from data. The relevance and performance of data-mining algorithms that potentially can provide the required methodology are explored. In particular, the C4 algorithm is applied to derive a decision tree for transport mode choice in the context of activity scheduling from a large activity diary data set. The algorithm is compared with both an alternative method of inducing decision trees (CHAID) and a logit model on the basis of goodness-of-fit on the same data set. The ratio of correctly predicted cases of a holdout sample is almost identical for the three methods. This suggests that for data sets of comparable complexity, the accuracy of predictions does not provide grounds for either rejecting or choosing the C4 method. However, the method may have advantages related to robustness. Future research is required to determine the ability of decision tree-based models in predicting behavioral change.


2015 ◽  
Vol 738-739 ◽  
pp. 191-196
Author(s):  
Yun Jie Li ◽  
Hui Song

In this paper, several data mining techniques were discussed and analyzed in order to achieve the objective of human daily activities recognition based on a continuous sensing data set. The data mining techniques of decision tree, Naïve Bayes and Neural Network were successfully applied to the data set. The paper also proposed an idea of combining the Neural Network with the Decision Tree, the result shows that it works much better than the typical Neural Network and the typical Decision Tree model.


2021 ◽  
Vol 5 (3) ◽  
pp. 1166
Author(s):  
Muchamad Sobri Sungkar ◽  
M Taufik Qurohman

Computer system architecture is one of the subjects that must be taken in the informatics engineering study program. In the study program the graduation of each student in the course is one of the important aspects that must be evaluated every semester. Graduation for each student / I in the course is an illustration that the learning process delivered is going well and also the material presented by the lecturer in charge of the course can be digested by students. Graduation of each student in the course can be predicted based on the habit pattern of the students. Data mining is an alternative process that can be done to find out habit patterns based on the data that has been collected. Data mining itself is an extraction process on a collection of data that produces valuable information for companies, agencies or organizations that can be used in the decision-making process. Prediction of graduation with data mining can be solved by classifying the data set. The C5.0 algorithm is an improvement algorithm from the C4.5 algorithm where the process is almost the same, only the C5.0 algorithm has advantages over the previous algorithm. The results of the C5.0 algorithm are in the form of a decision tree or a rule that is formed based on the entropy or gain value. The prediction process is carried out based on the classification of the C5.0 algorithm by using the attributes of Attendance Value, Assignment Value, UTS Value and UAS Value. The final result of the C5.0 algorithm classification process is a decision tree with rules in it. The performance of the C5.0 algorithm gets a high accuracy rate of 93.33%


2007 ◽  
Vol 46 (05) ◽  
pp. 523-529 ◽  
Author(s):  
M. Saraee ◽  
B. Theodoulidis ◽  
J. A. Keane ◽  
C. Tjortjis

Summary Objectives: Medical data are a valuable resource from which novel and potentially useful knowledge can be discovered by using data mining. Data mining can assist and support medical decision making and enhance clinical managementand investigative research. The objective of this work is to propose a method for building accurate descriptive and predictive models based on classification of past medical data. We also aim to compare this method with other well established data mining methods and identify strengths and weaknesses. Method: We propose T3, a decision tree classifier which builds predictive models based on known classes, by allowing for a certain amount of misclassification error in training in order to achieve better descriptive and predictive accuracy. We then experiment with a real medical data set on stroke, and various subsets, in order to identify strengths and weaknesses. We also compare performance with a very successful and well established decision tree classifier. Results: T3 demonstrated impressive performance when predicting unseen cases of stroke resulting in as little as 0.4% classification error while the state of the art decision tree classifier resulted in 33.6% classification error respectively. Conclusions: This paper presents and evaluates T3, a classification algorithm that builds decision trees of depth at most three, and results in high accuracy whilst keeping the tree size reasonably small. T3 demonstrates strong descriptive and predictive power without compromising simplicity and clarity. We evaluate T3 based on real stroke register data and compare it with C4.5, a well-known classification algorithm, showing that T3 produces significantly more accurate and readable classifiers.


Author(s):  
Anchal Dahiya ◽  
Pooja Mittal

After experiencing the hard times of pandemic situations we learned that if we could have a smart system that can help us in automatic parking of the vehicles then it could be a great help to society. This idea motivated us to carry out this current work. Though, nowadays, in almost every application domain, IoT techniques are the buzzword. IoT techniques can also be used to achieve efficacy in predicting free available parking space in advance. But the biggest challenge with IoT techniques is that they generate numerous data, which makes its analysis intangible. It was realized that if IoT techniques can be fused with outperforming data mining techniques, more efficient predictions can be performed. Thus, for this purpose, the main objective of our paper is to firstly, select the most appropriate data mining technique, based on performance evaluation, and then to perform prediction of available parking space in advance by fusing it with IoT techniques. Due to the busy schedule, the drivers need to get information about free parking spaces in advance by using smart phones. With the help of this information, it will be easy for the drivers to park their vehicle in the exact location without wasting their precious time and will maintain social distancing in crowded areas too. Data mining techniques can play an important role in the prediction of available parking space, by extracting only relevant and important information when applied to the given dataset. For this purpose, a comparative analysis of five data mining techniques such as the Support Vector Machine, K- Nearest approach, Decision Tree, Random Forest, and Ensemble learning approaches are applied on PK lot data set by using Python language. For calculation of result anaconda (spyder) is used as a supportive tool. The main outcome of the paper is to find the technique that will give better results for the prediction of the available space and if we fused data mining techniques with IoT technologies results are improvised. Evaluation parameters that are used for finding the best technique are precision, recall, accuracy, and F1-Score. For numerical calculation of the results, the k-fold cross-validation method is used. As the empirical results are calculated using the Pk lot dataset, the decision tree outperformed the best among all the techniques that are selected for analysis.


Sign in / Sign up

Export Citation Format

Share Document