scholarly journals Student Performance Prediction Using A Cascaded Bi-level Feature Selection Approach

2021 ◽  
Vol 3 (3) ◽  
Author(s):  
Wokili Abdullahi ◽  
Mary Ogbuka Kenneth ◽  
Morufu Olalere

Features in educational data are ambiguous which leads to noisy features and curse of dimensionality problems. These problems are solved via feature selection. There are existing models for features selection. These models were created using either a single-level embedded, wrapperbased or filter-based methods. However single-level filter-based methods ignore feature dependencies and ignore the interaction with the classifier. The embedded and wrapper based feature selection methods interact with the classifier, but they can only select the optimal subset for a particular classifier. So their selected features may be worse for other classifiers. Hence this research proposes a robust Cascade Bi-Level (CBL) feature selection technique for student performance prediction that will minimize the limitations of using a single-level technique. The proposed CBL feature selection technique consists of the Relief technique at first-level and the Particle Swarm Optimization (PSO) at the second-level. The proposed technique was evaluated using the UCI student performance dataset. In comparison with the performance of the single-level feature selection technique the proposed technique achieved an accuracy of 94.94% which was better than the values achieved by the single-level PSO with an accuracy of 93.67% for the binary classification task. These results show that CBL can effectively predict student performance.

2021 ◽  
Vol 13 (6) ◽  
pp. 41-52
Author(s):  
Rahul Deo Verma ◽  
Shefalika Ghosh Samaddar ◽  
A. B. Samaddar

The Border Gateway Protocol (BGP) provides crucial routing information for the Internet infrastructure. A problem with abnormal routing behavior affects the stability and connectivity of the global Internet. The biggest hurdles in detecting BGP attacks are extremely unbalanced data set category distribution and the dynamic nature of the network. This unbalanced class distribution and dynamic nature of the network results in the classifier's inferior performance. In this paper we proposed an efficient approach to properly managing these problems, the proposed approach tackles the unbalanced classification of datasets by turning the problem of binary classification into a problem of multiclass classification. This is achieved by splitting the majority-class samples evenly into multiple segments using Affinity Propagation, where the number of segments is chosen so that the number of samples in any segment closely matches the minority-class samples. Such sections of the dataset together with the minor class are then viewed as different classes and used to train the Extreme Learning Machine (ELM). The RIPE and BCNET datasets are used to evaluate the performance of the proposed technique. When no feature selection is used, the proposed technique improves the F1 score by 1.9% compared to state-of-the-art techniques. With the Fischer feature selection algorithm, the proposed algorithm achieved the highest F1 score of 76.3%, which was a 1.7% improvement over the compared ones. Additionally, the MIQ feature selection technique improves the accuracy by 3.5%. For the BCNET dataset, the proposed technique improves the F1 score by 1.8% for the Fisher feature selection technique. The experimental findings support the substantial improvement in performance from previous approaches by the new technique.


Electronics ◽  
2021 ◽  
Vol 10 (17) ◽  
pp. 2099
Author(s):  
Paweł Ziemba ◽  
Jarosław Becker ◽  
Aneta Becker ◽  
Aleksandra Radomska-Zalas ◽  
Mateusz Pawluk ◽  
...  

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.


2018 ◽  
Vol 18 (4) ◽  
pp. 15-28 ◽  
Author(s):  
B. Zerhari ◽  
A. Ait Lehcen ◽  
S. Mouline

Abstract Feature selection technique has been a very active research topic that addresses the problem of reducing the dimensionality. Whereas, datasets are continuously growing over time both in samples and features number. As a result, handling both irrelevant and redundant features has become a real challenge. In this paper we propose a new straightforward framework which combines the horizontal and vertical distributed feature selection technique, called Horizo-Vertical Distributed Feature Selection approach (HVDFS), aimed at achieving good performances as well as reducing the number of features. The effectiveness of our approach is demonstrated on three well-known datasets compared to the centralized and the previous distributed approach, using four well-known classifiers.


Author(s):  
Hua Tang ◽  
Chunmei Zhang ◽  
Rong Chen ◽  
Po Huang ◽  
Chenggang Duan ◽  
...  

Author(s):  
Uttamarani Pati ◽  
Papia Ray ◽  
Arvind R. Singh

Abstract Very short term load forecasting (VSTLF) plays a pivotal role in helping the utility workers make proper decisions regarding generation scheduling, size of spinning reserve, and maintaining equilibrium between the power generated by the utility to fulfil the load demand. However, the development of an effective VSTLF model is challenging in gathering noisy real-time data and complicates features found in load demand variations from time to time. A hybrid approach for VSTLF using an incomplete fuzzy decision system (IFDS) combined with a genetic algorithm (GA) based feature selection technique for load forecasting in an hour ahead format is proposed in this research work. This proposed work aims to determine the load features and eliminate redundant features to form a less complex forecasting model. The proposed method considers the time of the day, temperature, humidity, and dew point as inputs and generates output as forecasted load. The input data and historical load data are collected from the Northern Regional Load Dispatch Centre (NRLDC) New Delhi for December 2009, January 2010 and February 2010. For validation of proposed method efficacy, it’s performance is further compared with other conventional AI techniques like ANN and ANFIS, which are integrated with genetic algorithm-based feature selection technique to boost their performance. These techniques’ accuracy is tested through their mean absolute percentage error (MAPE) and normalized root mean square error (nRMSE) value. Compared to other conventional AI techniques and other methods provided through previous studies, the proposed method is found to have acceptable accuracy for 1 h ahead of electrical load forecasting.


Sign in / Sign up

Export Citation Format

Share Document