Comparing Data Mining Models in Academic Analytics

2016 ◽  
pp. 970-987
Author(s):  
Dheeraj Raju ◽  
Randall Schumacker

The goal of this research study was to compare data mining techniques in predicting student graduation. The data included demographics, high school, ACT profile, and college indicators from 1995-2005 for first-time, full-time freshman students with a six year graduation timeline for a flagship university in the south east United States. The results indicated no difference in misclassification rates between logistic regression, decision tree, neural network, and random forest models. The results from the study suggest that institutional researchers should build and compare different data mining models and choose the best one based on its advantages. The results can be used to predict students at risk and help these students graduate.

2016 ◽  
Vol 6 (2) ◽  
pp. 38-54 ◽  
Author(s):  
Dheeraj Raju ◽  
Randall Schumacker

The goal of this research study was to compare data mining techniques in predicting student graduation. The data included demographics, high school, ACT profile, and college indicators from 1995-2005 for first-time, full-time freshman students with a six year graduation timeline for a flagship university in the south east United States. The results indicated no difference in misclassification rates between logistic regression, decision tree, neural network, and random forest models. The results from the study suggest that institutional researchers should build and compare different data mining models and choose the best one based on its advantages. The results can be used to predict students at risk and help these students graduate.


2015 ◽  
Vol 2 ◽  
pp. 144-153
Author(s):  
Ace C. Lagman

More recently, researchers and higher education institutions are also beginning to explore the potential of data mining in analyzing academic data. The goal of such an endeavor is to find means to improve the services that these institutions provide and to enhance instruction. This type of data mining application is more popularly known as educational data mining or EDM. At present, EDM is more particularly focused on developing tools that can be used to discover patterns in academic data. It is more concerned about exploring a huge amount of data in order to identify patterns about the microconcepts involved in learning. This area of EDM is often referred to as Learning Analytics – at least as it is commonly compared to more prominent data mining approaches that process data from large repository for better decision-making. One main topic under educational data mining is student graduation. In the Philippines According to the National Statistics Office, there is an imbalance between student enrolment and student graduation. Almost half of the first time freshmen full-time students who began seeking a bachelor’s degree do not graduate on time. This scenario indicates the need to conduct research in this area in order to build models that can help improve the situation. The study focused to extract hidden patterns from the data set using logistic regression and decision tree algorithms that can be used to predict too early identification of students who are vulnerable to not having graduation on time so proper retention policies and measures be implemented by the administration.


Author(s):  
Ace C. Lagman ◽  
◽  
Lourwel P. Alfonso ◽  
Marie Luvett I. Goh ◽  
Jay-ar P. Lalata ◽  
...  

According to National Center for Education Statistics, almost half of the first-time freshmen full time students who began seeking a bachelor’s degree do not graduate. The imbalance between


2015 ◽  
Vol 2 ◽  
pp. 104-110
Author(s):  
Ace C. Lagman

Logistic regression is a predictive modeling technique that finds an association between the independent variables and the logarithm of the odds of a categorical response variable. This is one of the techniques used in analyzing a categorical dependent variable. The study focused on the application of logistic regression in predicting student graduation by generating data models that could early predict and identify students who are prone to not having graduation on time, so proper remediation and retention policies can be formulated and implemented by institutions. The student graduation rate is the percentage of a school’s first-time, first-year undergraduate students who complete their program successfully. Most students’ first-year freshmen enrolled at the tertiary level failed to graduate. According to the National Center for Education Statistics, almost half of the first time freshmen full-time students who began seeking a bachelor’s degree do not graduate. The colleges and universities consisting of high leaver rates go through a loss of fees and potential alumni contributors.


Atmosphere ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 109
Author(s):  
Ashima Malik ◽  
Megha Rajam Rao ◽  
Nandini Puppala ◽  
Prathusha Koouri ◽  
Venkata Anil Kumar Thota ◽  
...  

Over the years, rampant wildfires have plagued the state of California, creating economic and environmental loss. In 2018, wildfires cost nearly 800 million dollars in economic loss and claimed more than 100 lives in California. Over 1.6 million acres of land has burned and caused large sums of environmental damage. Although, recently, researchers have introduced machine learning models and algorithms in predicting the wildfire risks, these results focused on special perspectives and were restricted to a limited number of data parameters. In this paper, we have proposed two data-driven machine learning approaches based on random forest models to predict the wildfire risk at areas near Monticello and Winters, California. This study demonstrated how the models were developed and applied with comprehensive data parameters such as powerlines, terrain, and vegetation in different perspectives that improved the spatial and temporal accuracy in predicting the risk of wildfire including fire ignition. The combined model uses the spatial and the temporal parameters as a single combined dataset to train and predict the fire risk, whereas the ensemble model was fed separate parameters that were later stacked to work as a single model. Our experiment shows that the combined model produced better results compared to the ensemble of random forest models on separate spatial data in terms of accuracy. The models were validated with Receiver Operating Characteristic (ROC) curves, learning curves, and evaluation metrics such as: accuracy, confusion matrices, and classification report. The study results showed and achieved cutting-edge accuracy of 92% in predicting the wildfire risks, including ignition by utilizing the regional spatial and temporal data along with standard data parameters in Northern California.


2012 ◽  
Vol 8 (2) ◽  
pp. 44-63 ◽  
Author(s):  
Baoxun Xu ◽  
Joshua Zhexue Huang ◽  
Graham Williams ◽  
Qiang Wang ◽  
Yunming Ye

The selection of feature subspaces for growing decision trees is a key step in building random forest models. However, the common approach using randomly sampling a few features in the subspace is not suitable for high dimensional data consisting of thousands of features, because such data often contains many features which are uninformative to classification, and the random sampling often doesn’t include informative features in the selected subspaces. Consequently, classification performance of the random forest model is significantly affected. In this paper, the authors propose an improved random forest method which uses a novel feature weighting method for subspace selection and therefore enhances classification performance over high-dimensional data. A series of experiments on 9 real life high dimensional datasets demonstrated that using a subspace size of features where M is the total number of features in the dataset, our random forest model significantly outperforms existing random forest models.


Sign in / Sign up

Export Citation Format

Share Document