scholarly journals Information-theoretic feature selection algorithms for text classification

Author(s):  
J. Novovicovai ◽  
A. Malik
2019 ◽  
Vol 8 (4) ◽  
pp. 1333-1338

Text classification is a vital process due to the large volume of electronic articles. One of the drawbacks of text classification is the high dimensionality of feature space. Scholars developed several algorithms to choose relevant features from article text such as Chi-square (x2 ), Information Gain (IG), and Correlation (CFS). These algorithms have been investigated widely for English text, while studies for Arabic text are still limited. In this paper, we investigated four well-known algorithms: Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbors (KNN), and Decision Tree against benchmark Arabic textual datasets, called Saudi Press Agency (SPA) to evaluate the impact of feature selection methods. Using the WEKA tool, we have experimented the application of the four mentioned classification algorithms with and without feature selection algorithms. The results provided clear evidence that the three feature selection methods often improves classification accuracy by eliminating irrelevant features.


Author(s):  
Manpreet Kaur ◽  
Chamkaur Singh

Educational Data Mining (EDM) is an emerging research area help the educational institutions to improve the performance of their students. Feature Selection (FS) algorithms remove irrelevant data from the educational dataset and hence increases the performance of classifiers used in EDM techniques. This paper present an analysis of the performance of feature selection algorithms on student data set. .In this papers the different problems that are defined in problem formulation. All these problems are resolved in future. Furthermore the paper is an attempt of playing a positive role in the improvement of education quality, as well as guides new researchers in making academic intervention.


2021 ◽  
Vol 11 (15) ◽  
pp. 6983
Author(s):  
Maritza Mera-Gaona ◽  
Diego M. López ◽  
Rubiel Vargas-Canas

Identifying relevant data to support the automatic analysis of electroencephalograms (EEG) has become a challenge. Although there are many proposals to support the diagnosis of neurological pathologies, the current challenge is to improve the reliability of the tools to classify or detect abnormalities. In this study, we used an ensemble feature selection approach to integrate the advantages of several feature selection algorithms to improve the identification of the characteristics with high power of differentiation in the classification of normal and abnormal EEG signals. Discrimination was evaluated using several classifiers, i.e., decision tree, logistic regression, random forest, and Support Vecctor Machine (SVM); furthermore, performance was assessed by accuracy, specificity, and sensitivity metrics. The evaluation results showed that Ensemble Feature Selection (EFS) is a helpful tool to select relevant features from the EEGs. Thus, the stability calculated for the EFS method proposed was almost perfect in most of the cases evaluated. Moreover, the assessed classifiers evidenced that the models improved in performance when trained with the EFS approach’s features. In addition, the classifier of epileptiform events built using the features selected by the EFS method achieved an accuracy, sensitivity, and specificity of 97.64%, 96.78%, and 97.95%, respectively; finally, the stability of the EFS method evidenced a reliable subset of relevant features. Moreover, the accuracy, sensitivity, and specificity of the EEG detector are equal to or greater than the values reported in the literature.


Author(s):  
Nazila Darabi ◽  
Abdalhossein Rezai ◽  
Seyedeh Shahrbanoo Falahieh Hamidpour

Breast cancer is a common cancer in female. Accurate and early detection of breast cancer can play a vital role in treatment. This paper presents and evaluates a thermogram based Computer-Aided Detection (CAD) system for the detection of breast cancer. In this CAD system, the Random Subset Feature Selection (RSFS) algorithm and hybrid of minimum Redundancy Maximum Relevance (mRMR) algorithm and Genetic Algorithm (GA) with RSFS algorithm are utilized for feature selection. In addition, the Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) algorithms are utilized as classifier algorithm. The proposed CAD system is verified using MATLAB 2017 and a dataset that is composed of breast images from 78 patients. The implementation results demonstrate that using RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 85.36% and 75%, and sensitivity of 94.11% and 79.31%, respectively. In addition, using hybrid GA and RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 83.87% and 69.56%, and sensitivity of 96% and 81.81%, respectively, and using hybrid mRMR and RSFS algorithms for feature selection and kNN and SVM algorithms as classifier have accuracy of 77.41% and 73.07%, and sensitivity of 98% and 72.72%, respectively.


Sign in / Sign up

Export Citation Format

Share Document