Cross validation of a structural relationship model of student performance

1986 ◽  
Vol 50 (9) ◽  
pp. 545-548
Author(s):  
RH Potter

E-learning data becomes ‘Big’ data as it describes a huge volume of both structured and unstructured data. And inherent limitations of relational databases maintained in this context makes difficult to apply and to extract outputs meaningful. Data modeling is also recommended to design data views at various levels either conceptual or physical here. Most of the educational organizations are keen in collecting, storing and analyzing the students’ data because it will add more significant value to the decision making process. Data modeling through entity relationship model or query views plays a important role in dealing with big data due to the fact around 85% of big data is semi structured data. Hence data modeling should be carried out as required by any learning institution needs. Making big data component to reside in the data model is challenging. This paper is to establish data modeling techniques applied to a reasonably ‘big’ data in e-learning. Prediction models generated from this data will be accurate if the training sets and testing sets are governed properly in spite of data size complexity. Student Performance by study credits (partitioned in three classes: low, medium, high ) are classified with respect to their engagement attributes (activity types, sum of clicks made, duration in days) and obtained maximum accuracy 90.923%.


2015 ◽  
Author(s):  
Abu Sayed Md. Al Mamun ◽  
Yong Zulina Zubairi ◽  
Abdul Ghapor Hussin ◽  
A. H.M. Rahmatullah Imon

Author(s):  
Roberto Bertolini ◽  
Stephen J. Finch ◽  
Ross H. Nehm

AbstractEducators seek to harness knowledge from educational corpora to improve student performance outcomes. Although prior studies have compared the efficacy of data mining methods (DMMs) in pipelines for forecasting student success, less work has focused on identifying a set of relevant features prior to model development and quantifying the stability of feature selection techniques. Pinpointing a subset of pertinent features can (1) reduce the number of variables that need to be managed by stakeholders, (2) make “black-box” algorithms more interpretable, and (3) provide greater guidance for faculty to implement targeted interventions. To that end, we introduce a methodology integrating feature selection with cross-validation and rank each feature on subsets of the training corpus. This modified pipeline was applied to forecast the performance of 3225 students in a baccalaureate science course using a set of 57 features, four DMMs, and four filter feature selection techniques. Correlation Attribute Evaluation (CAE) and Fisher’s Scoring Algorithm (FSA) achieved significantly higher Area Under the Curve (AUC) values for logistic regression (LR) and elastic net regression (GLMNET), compared to when this pipeline step was omitted. Relief Attribute Evaluation (RAE) was highly unstable and produced models with the poorest prediction performance. Borda’s method identified grade point average, number of credits taken, and performance on concept inventory assessments as the primary factors impacting predictions of student performance. We discuss the benefits of this approach when developing data pipelines for predictive modeling in undergraduate settings that are more interpretable and actionable for faculty and stakeholders.


Sign in / Sign up

Export Citation Format

Share Document