Predictive Decision Support System using Logistic Regression and Decision Tree Model Combination for Student Graduation Success Determination

More recently, researchers and higher education institutions are also beginning to explore the potential of data mining in analyzing academic data. The goal of such an endeavor is to find means to improve the services that these institutions provide and to enhance instruction. This type of data mining application is more popularly known as educational data mining or EDM. At present, EDM is more particularly focused on developing tools that can be used to discover patterns in academic data. It is more concerned about exploring a huge amount of data in order to identify patterns about the microconcepts involved in learning. This area of EDM is often referred to as Learning Analytics – at least as it is commonly compared to more prominent data mining approaches that process data from large repository for better decision-making. One main topic under educational data mining is student graduation. In the Philippines According to the National Statistics Office, there is an imbalance between student enrolment and student graduation. Almost half of the first time freshmen full-time students who began seeking a bachelor’s degree do not graduate on time. This scenario indicates the need to conduct research in this area in order to build models that can help improve the situation. The study focused to extract hidden patterns from the data set using logistic regression and decision tree algorithms that can be used to predict too early identification of students who are vulnerable to not having graduation on time so proper retention policies and measures be implemented by the administration.

Download Full-text

Tool Fault Analysis with Decision Tree Induction and Sequence Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.548-549.703 ◽

2014 ◽

Vol 548-549 ◽

pp. 703-707

Author(s):

Kittisak Kerdprasop ◽

Nittaya Kerdprasop

Keyword(s):

Data Mining ◽

Decision Tree ◽

Fault Analysis ◽

Sequence Mining ◽

Process Data ◽

Statistical Process ◽

Monitoring Method ◽

Data Set ◽

Standard Data ◽

Decision Tree Induction

Tool fault analysis is a common task for process engineers in modern industries to maintain high yields of the final products. Statistical process control is a monitoring method normally adopted by most engineers. Recently, there has been enormous awareness among industrial and manufacturing engineers that intelligent techniques from the data mining and machine learning fields can be applied to discover subtle patterns from the manufacturing process data. In this paper, we present the two data mining techniques, i.e. decision tree induction and sequence mining, to discover frequently occurred patterns of the low performance wafer lots in the semiconductor manufacturing industries. The comparative analysis results of both techniques are presented through experimentation over the standard data set for the purpose of re-experimentation.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Comparing Data Mining Models in Academic Analytics

Psychology and Mental Health ◽

10.4018/978-1-5225-0159-6.ch040 ◽

2016 ◽

pp. 970-987

Author(s):

Dheeraj Raju ◽

Randall Schumacker

Keyword(s):

Data Mining ◽

Full Time ◽

Students At Risk ◽

Academic Analytics ◽

Institutional Researchers ◽

Forest Models ◽

Random Forest Models ◽

Student Graduation ◽

Misclassification Rates ◽

First Time

The goal of this research study was to compare data mining techniques in predicting student graduation. The data included demographics, high school, ACT profile, and college indicators from 1995-2005 for first-time, full-time freshman students with a six year graduation timeline for a flagship university in the south east United States. The results indicated no difference in misclassification rates between logistic regression, decision tree, neural network, and random forest models. The results from the study suggest that institutional researchers should build and compare different data mining models and choose the best one based on its advantages. The results can be used to predict students at risk and help these students graduate.

Download Full-text

Penerapan Educational Data Mining Untuk Memprediksi Hasil Belajar Siswa SMAK Ora et Labora

Jurnal Ilmu Komputer ◽

10.24843/jik.2019.v12.i02.p02 ◽

2019 ◽

Vol 12 (2) ◽

pp. 73

Author(s):

Daniel David

Keyword(s):

Data Mining ◽

Decision Tree ◽

Educational Data Mining ◽

Internal Assessment

Data mining adalah salah satu alternatif yang bisa dilakukan untuk melakukan penggalian informasi baru dari sejumlah data yang besar. Salah satu aliran data mining adalah Educational Data Mining (EDM). EDM adalah aliran data mining yang bergerak pada bidang pendidikan. Dengan memanfaatkan data-data yang berhubungan dengan pendidikan, proses data mining bisa dilakukan untuk menemukan informasi berguna untuk kemajuan dalam bidang pendidikan. Penelitian ini menggunakan EDM dengan tujuan untuk memanfaatkan data internal assessment dari dari masing-masing siswa sekolah dan melakukan prediksi terhadap hasil ujian akhir nasional siswa tersebut. Data mining ini menggunakan teknik klasifikasi dan metode Decision Tree C4.5. Selain itu akan digunakan juga metode penelitian deskriptif agar bisa memberikan hasil yang lebih akurat. Penelitian ini diharapkan bisa memberikan kontribusi dalam bentuk prediksi hasil ujian akhir nasional sehingga kedepannya bisa digunakan untuk siswa angkatan seterusnya.

Download Full-text

Data-Driven Decision Tree Classification for Product Portfolio Design Optimization

Journal of Computing and Information Science in Engineering ◽

10.1115/1.3243634 ◽

2009 ◽

Vol 9 (4) ◽

Cited By ~ 25

Author(s):

Conrad S. Tucker ◽

Harrison M. Kim

Keyword(s):

Data Mining ◽

Decision Tree ◽

Engineering Design ◽

Optimization Techniques ◽

Product Portfolio ◽

Performance Expectations ◽

Data Set ◽

Tree Data ◽

Portfolio Design ◽

Product Concepts

The formulation of a product portfolio requires extensive knowledge about the product market space and also the technical limitations of a company’s engineering design and manufacturing processes. A design methodology is presented that significantly enhances the product portfolio design process by eliminating the need for an exhaustive search of all possible product concepts. This is achieved through a decision tree data mining technique that generates a set of product concepts that are subsequently validated in the engineering design using multilevel optimization techniques. The final optimal product portfolio evaluates products based on the following three criteria: (1) it must satisfy customer price and performance expectations (based on the predictive model) defined here as the feasibility criterion; (2) the feasible set of products/variants validated at the engineering level must generate positive profit that we define as the optimality criterion; (3) the optimal set of products/variants should be a manageable size as defined by the enterprise decision makers and should therefore not exceed the product portfolio limit. The strength of our work is to reveal the tremendous savings in time and resources that exist when decision tree data mining techniques are incorporated into the product portfolio design and selection process. Using data mining tree generation techniques, a customer data set of 40,000 responses with 576 unique attribute combinations (entire set of possible product concepts) is narrowed down to 46 product concepts and then validated through the multilevel engineering design response of feasible products. A cell phone example is presented and an optimal product portfolio solution is achieved that maximizes company profit, without violating customer product performance expectations.

Download Full-text

CUDT: A CUDA Based Decision Tree Algorithm

The Scientific World JOURNAL ◽

10.1155/2014/745640 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 18

Author(s):

Win-Tsung Lo ◽

Yue-Shan Chang ◽

Ruey-Kai Sheu ◽

Chun-Chieh Chiu ◽

Shyan-Ming Yuan

Keyword(s):

Data Mining ◽

Decision Tree ◽

New Technology ◽

Large Data ◽

Decision Tree Algorithm ◽

Data Set ◽

Tree Algorithm ◽

Ubiquitous Sensing ◽

Device Architecture ◽

Huge Data

Decision tree is one of the famous classification methods in data mining. Many researches have been proposed, which were focusing on improving the performance of decision tree. However, those algorithms are developed and run on traditional distributed systems. Obviously the latency could not be improved while processing huge data generated by ubiquitous sensing node in the era without new technology help. In order to improve data processing latency in huge data mining, in this paper, we design and implement a new parallelized decision tree algorithm on a CUDA (compute unified device architecture), which is a GPGPU solution provided by NVIDIA. In the proposed system, CPU is responsible for flow control while the GPU is responsible for computation. We have conducted many experiments to evaluate system performance of CUDT and made a comparison with traditional CPU version. The results show that CUDT is 5∼55 times faster than Weka-j48 and is 18 times speedup than SPRINT for large data set.

Download Full-text

Predicting voluntary turnover through human resources database analysis

Management Research Review ◽

10.1108/mrr-04-2017-0098 ◽

2018 ◽

Vol 41 (1) ◽

pp. 96-112 ◽

Cited By ~ 3

Author(s):

Evy Rombaut ◽

Marie-Anne Guerry

Keyword(s):

Logistic Regression ◽

Decision Tree ◽

Human Resources ◽

Real Life ◽

Model Performance ◽

Voluntary Turnover ◽

Private Company ◽

Data Set ◽

Content Type ◽

Individual Level

Purpose This paper aims to question whether the available data in the human resources (HR) system could result in reliable turnover predictions without supplementary survey information. Design/methodology/approach A decision tree approach and a logistic regression model for analysing turnover were introduced. The methodology is illustrated on a real-life data set of a Belgian branch of a private company. The model performance is evaluated by the area under the ROC curve (AUC) measure. Findings It was concluded that data in the personnel system indeed lead to valuable predictions of turnover. Practical implications The presented approach brings determinants of voluntary turnover to the surface. The results yield useful information for HR departments. Where the logistic regression results in a turnover probability at the individual level, the decision tree makes it possible to ascertain employee groups that are at risk for turnover. With the data set-based approach, each company can, immediately, ascertain their own turnover risk. Originality/value The study of a data-driven approach for turnover investigation has not been done so far.

Download Full-text

Student Performance Predictions Using Knowledge Discovery Database and Data Mining, DPU Students Records as Sample

Academic Journal of Nawroz University ◽

10.25007/ajnu.v10n3a875 ◽

2021 ◽

Vol 10 (3) ◽

pp. 121-127

Author(s):

Bareen Haval ◽

Karwan Jameel Abdulrahman ◽

Araz Rajab

Keyword(s):

Data Mining ◽

Decision Tree ◽

Student Performance ◽

Educational Data Mining ◽

Data Sets ◽

Decision Tree Classifier ◽

Data Mining Techniques ◽

Academic History ◽

Tree Classifier ◽

Using Data

This article presents the results of connecting an educational data mining techniques to the academic performance of students. Three classification models (Decision Tree, Random Forest and Deep Learning) have been developed to analyze data sets and predict the performance of students. The projected submission of the three classificatory was calculated and matched. The academic history and data of the students from the Office of the Registrar were used to train the models. Our analysis aims to evaluate the results of students using various variables such as the student's grade. Data from (221) students with (9) different attributes were used. The results of this study are very important, provide a better understanding of student success assessments and stress the importance of data mining in education. The main purpose of this study is to show the student successful forecast using data mining techniques to improve academic programs. The results of this research indicate that the Decision Tree classifier overtakes two other classifiers by achieving a total prediction accuracy of 97%.

Download Full-text

Developed third iterative dichotomizer based on feature decisive values for educational data mining

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i1.pp209-217 ◽

2020 ◽

Vol 18 (1) ◽

pp. 209

Author(s):

Saja Taha Ahmed ◽

Rafah Al-Hamdani ◽

Muayad Sadik Croock

Keyword(s):

Data Mining ◽

Feature Selection ◽

Decision Tree ◽

Predictive Analytics ◽

Educational Data Mining ◽

Target Class ◽

Id3 Algorithm ◽

Feature Weight ◽

Holdout Validation ◽

Fold Cross Validation

Recently, the decision trees have been adopted among the preeminent utilized classification models. They acquire their fame from their efficiency in predictive analytics, easy to interpret and implicitly perform feature selection. This latter perspective is one of essential significance in Educational Data Mining (EDM), in which selecting the most relevant features has a major impact on classification accuracy enhancement. The main contribution is to build a new multi-objective decision tree, which can be used for feature selection and classification. The proposed Decisive Decision Tree (DDT) is introduced and constructed based on a decisive feature value as a feature weight related to the target class label. The traditional Iterative Dichotomizer 3 (ID3) algorithm and the proposed DDT are compared using three datasets in terms of some ID3 issues, including logarithmic calculation complexity and multi-values featuresselection. The results indicated that the proposed DDT outperforms the ID3 in the developing time. The accuracy of the classification is improved on the basis of 10-fold cross-validation for all datasets with the highest accuracy achieved by the proposed method is 92% for the student.por dataset and holdout validation for two datasets, i.e. Iraqi and Student-Math. The experiment also shows that the proposed DDT tends to select attributes that are important rather than multi-value.

Download Full-text

Prediksi Kelulusan Mata Kuliah Menggunakan Hybrid Fuzzy Inference System

10.26594/r.v2i2.548 ◽

2016 ◽

Vol 2 (2) ◽

pp. 60

Author(s):

Abidatul Izzah ◽

Ratna Widyastuti

Keyword(s):

Data Mining ◽

Decision Tree ◽

Fuzzy Inference System ◽

Fuzzy Inference ◽

Educational Data Mining ◽

Fuzzy Rule ◽

Posttest Score ◽

Inference System ◽

Is Implementation ◽

Attention To Students

AbstrakPerguruan Tinggi merupakan salah satu institusi yang menyimpan data yang sangat informatif jika diolah secara baik. Prediksi kelulusan mahasiswa merupakan kasus di Perguruan Tinggi yang cukup banyak diteliti. Dengan mengetahui prediksi status kelulusan mahasiswa di tengah semester, dosen dapat mengantisipasi atau memberi perhatian khusus pada siswa yang diprediksi tidak lulus. Metode yang digunakan sangat bervariatif termasuk metode Fuzzy Inference System (FIS). Namun dalam implementasinya, proses pembangkitan rule fuzzy sering dilakukan secara random atau berdasarkan pemahaman pakar sehingga tidak merepresentasikan sebaran data. Oleh karena itu, dalam penelitian ini digunakan teknik Decision Tree (DT) untuk membangkitkan rule. Dari uraian tersebut, penelitian bertujuan untuk memprediksi kelulusan mata kuliah menggunakan hybrid FIS dan DT. Data yang digunakan dalam penelitian ini adalah data nilai Posttest, Tugas, Kuis, dan UTS dari 106 mahasiswa Politeknik Kediri pengikut mata kuliah Algoritma dan Struktur Data. Penelitian ini diawali dari membangkitkan 5 rule yang selanjutnya digunakan dalam inferensi. Tahap selanjutnya adalah implementasi FIS dengan tahapan fuzzifikasi, inferensi, dan defuzzifikasi. Hasil yang diperoleh adalah akurasi, sensitivitas, dan spesifisitas masing-masing adalah 94.33%, 96.55%, dan 84.21%.Kata kunci: Decision Tree, Educational Data Mining, Fuzzy Inference System, Prediksi. AbstractCollege is an institution that holds very informative data if it mined properly. Prediction about student’s graduation is a common case that many discussed. Having the predictions of student’s graduation in the middle semester, lecturer will anticipate or give some special attention to students who would be not passed. The method used to prediction is very varied including Fuzzy Inference System (FIS). However, fuzzy rule process is often generated randomly or based on knowledge experts that not represent the data distribution. Therefore, in this study, we used a Decision Tree (DT) technique for generate the rules. So, the research aims to predict courses graduation using hybrid FIS and DT. Dataset used is the posttest score, tasks score, quizzes score, and middle test score from 106 students of the Polytechnic Kediri who took Algorithms and Data Structures. The research started by generating 5 rules by decision tree. The next is implementation of FIS that consist of fuzzification, inference, and defuzzification. The results show that the classifier give a good result in an accuracy, sensitivity, and specificity respectively was 94.33%, 96.55% and 84.21%.Keywords: Decision Tree, Educational Data Mining, Fuzzy Inference System, Prediction.

Download Full-text