Prediction of Student Dropout in Malaysian’s Private Higher Education Institute using Data Mining Application

Author(s):  
Nurhana Roslan Et.al

Student dropout issue is a major concern among the academics and management of the university. The higher rate of student dropout impacted the university reputation such as reducing student enrollment, affecting the revenue of the university, financial losses for the country, and increase the existence of a social problem among the students. In this study, 2 popular classifiers were utilized to predict the student dropout namely decision tree and logistic regression model respectively. Several sets of experimental setting were employed which include three set of data partitioning - along with different types of decision tree and regression model. As for the logistic regression model, different data imputation and transformation method was tested to ensure that the model built is valid. A total of 7706 student data extracted from one of the private universities in Malaysia database (between year 2018-2019) to assess the capability of the classifier. The classifier performance is evaluated using machine learning performance measure of accuracy and misclassification rate. The result indicates that, decision tree - chi-square (2 branches) achieved slightly better classification performance of 89.49% on 80/20 data partitioning. The chosen model also identified the most important variable for accurate prediction of student dropout. Application of this model has the potential to accurately predict at risk student and to reduce student dropout rates.

2020 ◽  
Vol 93 (1112) ◽  
pp. 20190891
Author(s):  
Xiaoying Xing ◽  
Jiahui Zhang ◽  
Yongye Chen ◽  
Qiang Zhao ◽  
Ning Lang ◽  
...  

Objective: To explore the value of related parameters in monoexponential, biexponential, and stretched-exponential models of diffusion-weighted imaging (DWI) in differentiating metastases and myeloma in the spine. Methods: 53 metastases and 16 myeloma patients underwent MRI with 10 b-values (0–1500 s/mm2). Parameters of apparent diffusion coefficient (ADC), true diffusion coefficient (D), pseudo-diffusion coefficient (D*), perfusion fraction (f), the distribution diffusion coefficient (DDC), and intravoxel water diffusion heterogeneity (α) from DWI were calculated. The independent sample t test and the Mann–Whiney U test were used to compare the statistical difference of the parameter values between the two. Receiver operating characteristics (ROC) curve analysis was used to identify the diagnostic efficacy. Then substituted each parameter into the decision tree model and logistic regression model, identified meaningful parameters, and evaluated their joint diagnostic performance. Results: The ADC, D, and α values of metastases were higher than those of myeloma, whereas the D* value was lower than that of myeloma, and the difference was significant (p < 0.05); the area under the ROC curve for the above parameters was 0.661, 0.710, 0.781, and 0.743, respectively. There was no significant difference in the f and DDC values (p > 0.05). D and α were found to conform to the decision tree model, and the accuracy of model diagnosis was 84.1%. ADC and α were found to conform to the logistic regression model, and the accuracy was 87.0%. Conclusion: The 3 models of DWI have certain values indifferentiating metastases and myeloma in spine, and the diagnostic performance of ADC, D, α and D*was better. Combining ADC with α may markedly aid in the differential diagnosis of the two. Advances in knowledge: Monoexponential, biexponential, and stretched-exponential models can offer additional information in the differential diagnosis of metastases and myeloma in the spine. Decision tree model and logistic regression model are effective methods to help further distinguish the two.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Zhaosheng Yang ◽  
Xiujuan Tian ◽  
Wei Wang ◽  
Xiyang Zhou ◽  
Hongmei Liang

Vehicles are often caught in dilemma zone when they approach signalized intersections in yellow interval. The existence of dilemma zone which is significantly influenced by driver behavior seriously affects the efficiency and safety of intersections. This paper proposes the driver behavior models in yellow interval by logistic regression and fuzzy decision tree modeling, respectively, based on camera image data. Vehicle’s speed and distance to stop line are considered in logistic regression model, which also brings in a dummy variable to describe installation of countdown timer display. Fuzzy decision tree model is generated by FID3 algorithm whose heuristic information is fuzzy information entropy based on membership functions. This paper concludes that fuzzy decision tree is more accurate to describe driver behavior at signalized intersection than logistic regression model.


2021 ◽  
Vol 10 (44) ◽  
pp. 3736-3741
Author(s):  
Soraya Siabani ◽  
Leila Solouki ◽  
Mehdi Moradinazar ◽  
Farid Najafi ◽  
Ebrahim Shakiba

BACKGROUND Given the global burden of COVID-19 mortality, this study intended to determine the factors affecting mortality in patients with COVID-19 using decision tree analysis and logistic regression model in Kermanshah province, 2020. METHODS This cross-sectional study was conducted on 7799 patients with COVID-19 admitted to the hospitals of Kermanshah province. Data gathered from February 18 to July 9, 2020, were obtained from the vice-chancellor for the health of Kermanshah University of Medical Sciences. The performance of the models was compared according to the sensitivity, specificity, and area under the receiver operating characteristic (ROC) curve. RESULTS According to the decision tree model, the most important risk factors for death due to COVID-19 were age, body temperature, admission to intensive care unit (ICU), prior hospital visit within the last 14 days, and cardiovascular disease. Also, the multivariate logistic regression model showed that the variables of age [OR = 4.47, 95 % CI: (3.16 -6.32)], shortness of breath [OR = 1.42, 95 % CI: (1.0-2.01)], ICU admission [OR = 3.75, 95 % CI: (2.47-5.68)], abnormal chest X-ray [OR = 1.93, 95 % CI: (1.06-3.41)], liver disease [OR = 5.05, 95 % CI (1.020-25.2)], body temperature [OR = 4.93, 95 % CI: (2.17-6.25)], and cardiovascular disease [OR = 2.15, 95 % CI: (1.27-3.06)] were significantly associated with the higher mortality of patients with COVID-19. The area under the ROC curve for the decision tree model and logistic regression was 0.77 and 0.75, respectively. CONCLUSIONS Identifying risk factors for mortality in patients with COVID-19 can provide more effective interventions in the early stages of treatment and improve the medical approaches provided by the medical staff. KEY WORDS COVID-19, Decision Tree, Logistic Regression, Mortality, Risk Factor


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Siyu Liu ◽  
Yue Gao ◽  
Yuhang Shen ◽  
Min Zhang ◽  
Jingjing Li ◽  
...  

Abstract Background At present, the proportion of undiagnosed diabetes in Chinese adults is as high as 15.5%. People with diabetes who are not treated and controlled in time may have various complications, such as cardiovascular and cerebrovascular diseases and diabetic foot disorders, which not only seriously affect the quality of life of people with diabetes but also impose a heavy burden on families and society. Therefore, prevention and control of type 2 diabetes is of great significance. Methods We constructed a logistic regression model, a neural network model and a decision tree model to analyse the risk factors for type 2 diabetes and then compared the prediction accuracy of the different models by calculating the area under the relative operating characteristic (ROC) curve and back-inputting the data into the model. Results The prevalence of type 2 diabetes in 4177 subjects who were not diagnosed with type 2 diabetes was 9.31%. The most influential factors associated with type 2 diabetes were triglyceride (TG) ≥ 1.17 mmol/L (odds ratio (OR) =2.233), age ≥ 70 years (OR = 1.734), hypertension (OR = 1.703), alcohol consumption (OR = 1.674), and total cholesterol≥5.2 mmol/L (TC) (OR = 1.463). The prediction accuracies of the three prediction models were 90.8, 91.2, and 90.7%, respectively, and the areas under curve (AUCs) were 0.711, 0.780, and 0.698, respectively. The differences in the AUCs after back propagation (BP) of the neural network model, logistic regression model and decision tree model were statistically significant (P < 0.05). Conclusion BP neural networks have a higher predictive power for identifying the associated risk factors of type 2 diabetes than the other two models, but it is necessary to select a suitable model for specific situations.


Sign in / Sign up

Export Citation Format

Share Document