COMPARATIVE STUDY: FEATURE SELECTION METHODS IN THE BLENDED LEARNING ENVIRONMENT

Research presented in this paper deals with the unknown behavior pattern of students in the blended learning environment. In order to improve prediction accuracy it was necessary to determine the methodology for students` activities assessments. The Training set was created by combining distributed sources – Moodle database and traditional learning process. The methodology emphasizes data mining preprocessing phase: transformation and features selection. Information gain, Symmetrical Uncert Feature Eval, RelieF, Correlation based Feature Selection, Wrapper Subset Evaluation, Classifier Subset Evaluator features selection methods were implemented to find the most relevant subset. Statistical dependence was determined by calculating mutual information measure. Naïve Bayes, Aggregating One-Dependence Estimators, Decision tree and Support Vector Machines classifiers have been trained for subsets with different cardinality. Models were evaluated with comparative analysis of statistical parameters and time required to build them. We have concluded that the RelieF, Wrapper Subset Evaluation and mutual information present the most convenient features selection methods for blended learning environment. The major contribution of the presented research is selecting the optimal low-cardinal subset of students’ activities and a significant prediction accuracy improvement in blended learning environment.

Download Full-text

FEATURE SELECTION METHODS BASED ON MUTUAL INFORMATION FOR CLASSIFYING HETEROGENEOUS FEATURES

Jurnal Ilmu Komputer dan Informasi ◽

10.21609/jiki.v9i2.384 ◽

2016 ◽

Vol 9 (2) ◽

pp. 106

Author(s):

Ratri Enggar Pawening ◽

Tio Darmawan ◽

Rizqa Raaiqa Bintana ◽

Agus Zainal Arifin ◽

Darlis Herumurti

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Support Vector ◽

Feature Subset ◽

Feature Transformation ◽

Selection Methods ◽

Class Label ◽

Svm Algorithm ◽

Heterogeneous Features

Datasets with heterogeneous features can affect feature selection results that are not appropriate because it is difficult to evaluate heterogeneous features concurrently. Feature transformation (FT) is another way to handle heterogeneous features subset selection. The results of transformation from non-numerical into numerical features may produce redundancy to the original numerical features. In this paper, we propose a method to select feature subset based on mutual information (MI) for classifying heterogeneous features. We use unsupervised feature transformation (UFT) methods and joint mutual information maximation (JMIM) methods. UFT methods is used to transform non-numerical features into numerical features. JMIM methods is used to select feature subset with a consideration of the class label. The transformed and the original features are combined entirely, then determine features subset by using JMIM methods, and classify them using support vector machine (SVM) algorithm. The classification accuracy are measured for any number of selected feature subset and compared between UFT-JMIM methods and Dummy-JMIM methods. The average classification accuracy for all experiments in this study that can be achieved by UFT-JMIM methods is about 84.47% and Dummy-JMIM methods is about 84.24%. This result shows that UFT-JMIM methods can minimize information loss between transformed and original features, and select feature subset to avoid redundant and irrelevant features.

Download Full-text

Optainet-based technique for SVR feature selection and parameters optimization for software cost prediction

MATEC Web of Conferences ◽

10.1051/matecconf/202134801002 ◽

2021 ◽

Vol 348 ◽

pp. 01002

Author(s):

Assia Najm ◽

Abdelali Zakrani ◽

Abdelaziz Marzak

Keyword(s):

Feature Selection ◽

Random Forest ◽

Prediction Models ◽

Project Managers ◽

Parameters Optimization ◽

Support Vector ◽

Features Selection ◽

Selection Methods ◽

Cost Prediction ◽

Software Cost

The software cost prediction is a crucial element for a project’s success because it helps the project managers to efficiently estimate the needed effort for any project. There exist in literature many machine learning methods like decision trees, artificial neural networks (ANN), and support vector regressors (SVR), etc. However, many studies confirm that accurate estimations greatly depend on hyperparameters optimization, and on the proper input feature selection that impacts highly the accuracy of software cost prediction models (SCPM). In this paper, we propose an enhanced model using SVR and the Optainet algorithm. The Optainet is used at the same time for 1-selecting the best set of features and 2-for tuning the parameters of the SVR model. The experimental evaluation was conducted using a 30% holdout over seven datasets. The performance of the suggested model is then compared to the tuned SVR model using Optainet without feature selection. The results were also compared to the Boruta and random forest features selection methods. The experiments show that for overall datasets, the Optainet-based method improves significantly the accuracy of the SVR model and it outperforms the random forest and Boruta feature selection methods.

Download Full-text

Improving the prediction accuracy in blended learning environment using synthetic minority oversampling technique

Information Discovery and Delivery ◽

10.1108/idd-08-2018-0036 ◽

2019 ◽

Vol 47 (2) ◽

pp. 76-83 ◽

Cited By ~ 2

Author(s):

Gabrijela Dimic ◽

Dejan Rancic ◽

Nemanja Macek ◽

Petar Spalevic ◽

Vida Drasute

Keyword(s):

Feature Selection ◽

Learning Environment ◽

Blended Learning ◽

Prediction Accuracy ◽

Design Methodology ◽

Feature Vector ◽

Feature Selection Method ◽

Feature Subset ◽

Content Type ◽

Correlation Based Feature Selection

Purpose This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment. Design/methodology/approach To extract the most relevant activity feature subset, different feature-selection methods were applied. For different cardinality subsets, classification models were used in the comparison. Findings Experimental evaluation oppose the hypothesis that feature vector dimensionality reduction leads to prediction accuracy increasing. Research limitations/implications Improving prediction accuracy in a described learning environment was based on applying synthetic minority oversampling technique, which had affected results on correlation-based feature-selection method. Originality/value The major contribution of the research is the proposed methodology for selecting the optimal low-cardinal subset of students’ activities and significant prediction accuracy improvement in a blended learning environment.

Download Full-text

The Effectiveness of Feature Selection Method in Solar Power Prediction

Journal of Renewable Energy ◽

10.1155/2013/952613 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Md Rahat Hossain ◽

Amanullah Maung Than Oo ◽

A. B. M. Shawkat Ali

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Prediction Accuracy ◽

Solar Power ◽

Feature Subset Selection ◽

Machine Learning Techniques ◽

Support Vector ◽

Selection Methods ◽

Power Prediction ◽

Learning Techniques

This paper empirically shows that the effect of applying selected feature subsets on machine learning techniques significantly improves the accuracy for solar power prediction. Experiments are performed using five well-known wrapper feature selection methods to obtain the solar power prediction accuracy of machine learning techniques with selected feature subsets. For all the experiments, the machine learning techniques, namely, least median square (LMS), multilayer perceptron (MLP), and support vector machine (SVM), are used. Afterwards, these results are compared with the solar power prediction accuracy of those same machine leaning techniques (i.e., LMS, MLP, and SVM) but without applying feature selection methods (WAFS). Experiments are carried out using reliable and real life historical meteorological data. The comparison between the results clearly shows that LMS, MLP, and SVM provide better prediction accuracy (i.e., reduced MAE and MASE) with selected feature subsets than without selected feature subsets. Experimental results of this paper facilitate to make a concrete verdict that providing more attention and effort towards the feature subset selection aspect (e.g., selected feature subsets on prediction accuracy which is investigated in this paper) can significantly contribute to improve the accuracy of solar power prediction.

Download Full-text

A fuzzy gaussian rank aggregation ensemble feature selection method for microarray data

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-190134 ◽

2021 ◽

Vol 24 (4) ◽

pp. 289-301

Author(s):

B. Venkatesh ◽

J. Anuradha

Keyword(s):

Feature Selection ◽

Microarray Data ◽

Classification Accuracy ◽

Performance Metrics ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Binary Particle Swarm Optimization ◽

Selection Methods

In Microarray Data, it is complicated to achieve more classification accuracy due to the presence of high dimensions, irrelevant and noisy data. And also It had more gene expression data and fewer samples. To increase the classification accuracy and the processing speed of the model, an optimal number of features need to extract, this can be achieved by applying the feature selection method. In this paper, we propose a hybrid ensemble feature selection method. The proposed method has two phases, filter and wrapper phase in filter phase ensemble technique is used for aggregating the feature ranks of the Relief, minimum redundancy Maximum Relevance (mRMR), and Feature Correlation (FC) filter feature selection methods. This paper uses the Fuzzy Gaussian membership function ordering for aggregating the ranks. In wrapper phase, Improved Binary Particle Swarm Optimization (IBPSO) is used for selecting the optimal features, and the RBF Kernel-based Support Vector Machine (SVM) classifier is used as an evaluator. The performance of the proposed model are compared with state of art feature selection methods using five benchmark datasets. For evaluation various performance metrics such as Accuracy, Recall, Precision, and F1-Score are used. Furthermore, the experimental results show that the performance of the proposed method outperforms the other feature selection methods.

Download Full-text

Constructing an Emotion Estimation Model Based on EEG/HRV Indexes Using Feature Extraction and Feature Selection Algorithms

Sensors ◽

10.3390/s21092910 ◽

2021 ◽

Vol 21 (9) ◽

pp. 2910

Author(s):

Kei Suzuki ◽

Tipporn Laohakangvalvit ◽

Ryota Matsubara ◽

Midori Sugaya

Keyword(s):

Feature Selection ◽

Single Channel ◽

Feature Selection Method ◽

Classification Model ◽

Features Selection ◽

Selection Methods ◽

Emotion Classification ◽

Model Based ◽

Physiological Indexes ◽

Emotion Estimation

In human emotion estimation using an electroencephalogram (EEG) and heart rate variability (HRV), there are two main issues as far as we know. The first is that measurement devices for physiological signals are expensive and not easy to wear. The second is that unnecessary physiological indexes have not been removed, which is likely to decrease the accuracy of machine learning models. In this study, we used single-channel EEG sensor and photoplethysmography (PPG) sensor, which are inexpensive and easy to wear. We collected data from 25 participants (18 males and 7 females) and used a deep learning algorithm to construct an emotion classification model based on Arousal–Valence space using several feature combinations obtained from physiological indexes selected based on our criteria including our proposed feature selection methods. We then performed accuracy verification, applying a stratified 10-fold cross-validation method to the constructed models. The results showed that model accuracies are as high as 90% to 99% by applying the features selection methods we proposed, which suggests that a small number of physiological indexes, even from inexpensive sensors, can be used to construct an accurate emotion classification model if an appropriate feature selection method is applied. Our research results contribute to the improvement of an emotion classification model with a higher accuracy, less cost, and that is less time consuming, which has the potential to be further applied to various areas of applications.

Download Full-text

Feature Selection Method Based on Mutual Information and Support Vector Machine

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142150021x ◽

2021 ◽

pp. 2150021

Author(s):

Gang Liu ◽

Chunlei Yang ◽

Sen Liu ◽

Chunbao Xiao ◽

Bin Song

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Mutual Information ◽

Classification Accuracy ◽

Feature Selection Method ◽

Selection Method ◽

Support Vector ◽

Svm Classifier ◽

Standard Data ◽

Feature Dimension

A feature selection method based on mutual information and support vector machine (SVM) is proposed in order to eliminate redundant feature and improve classification accuracy. First, local correlation between features and overall correlation is calculated by mutual information. The correlation reflects the information inclusion relationship between features, so the features are evaluated and redundant features are eliminated with analyzing the correlation. Subsequently, the concept of mean impact value (MIV) is defined and the influence degree of input variables on output variables for SVM network based on MIV is calculated. The importance weights of the features described with MIV are sorted by descending order. Finally, the SVM classifier is used to implement feature selection according to the classification accuracy of feature combination which takes MIV order of feature as a reference. The simulation experiments are carried out with three standard data sets of UCI, and the results show that this method can not only effectively reduce the feature dimension and high classification accuracy, but also ensure good robustness.

Download Full-text

Incorporate Syntactic Information for Short Text Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.697 ◽

2011 ◽

Vol 268-270 ◽

pp. 697-700

Author(s):

Rui Xue Duan ◽

Xiao Jie Wang ◽

Wen Feng Li

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Environment ◽

Text Classification ◽

The Internet ◽

Selection Methods ◽

Text Documents ◽

Short Text ◽

Syntactic Information ◽

Dependency Relations

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text

FEATURE SELECTION BASED ON MINIMUM ERROR MINIMAX PROBABILITY MACHINE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005958 ◽

2007 ◽

Vol 21 (08) ◽

pp. 1279-1292 ◽

Cited By ~ 5

Author(s):

ZENGLIN XU ◽

IRWIN KING ◽

MICHAEL R. LYU

Keyword(s):

Feature Selection ◽

Prediction Accuracy ◽

Support Vector ◽

Feature Subset ◽

Minimum Error ◽

Automatic Balance ◽

Classification Framework ◽

Data Partitions ◽

Minimax Probability Machine ◽

Optimal Feature Subset

Feature selection is an important task in pattern recognition. Support Vector Machine (SVM) and Minimax Probability Machine (MPM) have been successfully used as the classification framework for feature selection. However, these paradigms cannot automatically control the balance between prediction accuracy and the number of selected features. In addition, the selected feature subsets are also not stable in different data partitions. Minimum Error Minimax Probability Machine (MEMPM) has been proposed for classification recently. In this paper, we outline MEMPM to select the optimal feature subset with good stability and automatic balance between prediction accuracy and the size of feature subset. The experiments against feature selection with SVM and MPM show the advantages of the proposed MEMPM formulation in stability and automatic balance between the feature subset size and the prediction accuracy.

Download Full-text

Hybrid adapted fast correlation FCBF-support vector machine recursive feature elimination for feature selection

Intelligent Decision Technologies ◽

10.3233/idt-190014 ◽

2020 ◽

Vol 14 (3) ◽

pp. 269-279

Author(s):

Hayet Djellali ◽

Nacira Ghoualmi-Zine ◽

Souad Guessoum

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Recursive Feature Elimination ◽

Support Vector ◽

Svm Classifier ◽

Hybrid Architecture ◽

Features Selection ◽

K Nearest Neighbors ◽

Correlation Based Feature Selection ◽

Embedded Method

This paper investigates feature selection methods based on hybrid architecture using feature selection algorithm called Adapted Fast Correlation Based Feature selection and Support Vector Machine Recursive Feature Elimination (AFCBF-SVMRFE). The AFCBF-SVMRFE has three stages and composed of SVMRFE embedded method with Correlation based Features Selection. The first stage is the relevance analysis, the second one is a redundancy analysis, and the third stage is a performance evaluation and features restoration stage. Experiments show that the proposed method tested on different classifiers: Support Vector Machine SVM and K nearest neighbors KNN provide a best accuracy on various dataset. The SVM classifier outperforms KNN classifier on these data. The AFCBF-SVMRFE outperforms FCBF multivariate filter, SVMRFE, Particle swarm optimization PSO and Artificial bees colony ABC.

Download Full-text