A review on feature selection and feature extraction for text classification

2018 ◽

Vol 7 (2.27) ◽

pp. 156 ◽

Cited By ~ 1

Author(s):

Bipanjyot Kaur ◽

Gourav Bathla

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Language Processing ◽

Text Classification ◽

Complete Classification ◽

Learning Technique ◽

Hybrid Classification ◽

Classification Evaluation ◽

Class Labels ◽

Evaluation Parameters

Text classification is technique for assigning the class or label to a particular document within predefined class labels. Predefined classes examples are sports, business, technical, education and science etc. Classification is supervised learning technique i.e. these classes are trained with certain features and then document is classified based on similarity measure with these trained document set. Text classification is used in many applications like assigning the label to the documents, separating the spam messages from the genuine one, filtering of text, natural language processing etc. Feature selection, extraction and classification are various phases for assigning label to any document. In this paper, PCA is used for feature extraction, ABC is used for feature selection and SVM is used for classification. PCA is improved by applying normalization-using size of features in our proposed approach. It reduces the redundant features to larger extent. There are very few research works, which have implemented PCA, ABC and SVM for complete classification. Evaluation parameters like accuracy, F-measure and G-mean are calculated to check classifier efficiency. The proposed system is deployed on 20-Newsgroup dataset. Experiment analysis proves that accuracy is improved using our proposed approach as compared to existing approaches.

Download Full-text

Enhancing Effectiveness of Dimension Reduction in Text Classification

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213017500087 ◽

2016 ◽

Vol 26 (03) ◽

pp. 1750008 ◽

Cited By ~ 1

Author(s):

Seyyed Hossein Seyyedi ◽

Behrouz Minaei-Bidgoli

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Dimension Reduction ◽

Text Classification ◽

Field Experiments ◽

Information Gain ◽

Dimensional Space ◽

Feature Space ◽

Spam Detection ◽

Selection Methods

Nowadays, text is one prevalent forms of data and text classification is a widely used data mining task, which has various application fields. One mass-produced instance of text is email. As a communication medium, despite having a lot of advantages, email suffers from a serious problem. The number of spam emails has steadily increased in the recent years, leading to considerable irritation. Therefore, spam detection has emerged as a separate field of text classification. A primary challenge of text classification, which is more severe in spam detection and impedes the process, is high-dimensionality of feature space. Various dimension reduction methods have been proposed that produce a lower dimensional space compared to the original. These methods are divided mainly into two groups: feature selection and feature extraction. This research deals with dimension reduction in the text classification task and especially performs experiments in the spam detection field. We employ Information Gain (IG) and Chi-square Statistic (CHI) as well-known feature selection methods. Also, we propose a new feature extraction method called Sprinkled Semantic Feature Space (SSFS). Furthermore, this paper presents a new hybrid method called IG_SSFS. In IG_SSFS, we combine the selection and extraction processes to reap the benefits from both. To evaluate the mentioned methods in the spam detection field, experiments are conducted on some well-known email datasets. According to the results, SSFS demonstrated superior effectiveness over the basic selection methods in terms of improving classifiers’ performance, and IG_SSFS further enhanced the performance despite consuming less processing time.

Download Full-text

Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection

International Journal of Information Engineering and Electronic Business ◽

10.5815/ijieeb.2015.02.08 ◽

2015 ◽

Vol 7 (2) ◽

pp. 60-65 ◽

Cited By ~ 23

Author(s):

Masoumeh Zareapoor ◽

◽

Seeja K. R

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Text Classification ◽

Selection For

Download Full-text

Survey of Feature Selection and Text Classification Methods for Genetic Mutation Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.933937 ◽

2019 ◽

Vol 7 (4) ◽

pp. 933-937

Author(s):

Varun Saproo ◽

Rujuta Upadhyay ◽

Manisha Valera

Keyword(s):

Feature Selection ◽

Text Classification ◽

Genetic Mutation ◽

Classification Methods

Download Full-text

Impact of Feature Extraction and Feature Selection on Indonesian Personality Trait Classification

2020 3rd International Conference on Information and Communications Technology (ICOIACT) ◽

10.1109/icoiact50329.2020.9332107 ◽

2020 ◽

Author(s):

Ahmad Fikri Iskandar ◽

Ema Utami ◽

Agung Budi Prasetio

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Personality Trait

Download Full-text

A systematic mapping of feature extraction and feature selection methods of electroencephalogram signals for neurological diseases diagnostic assistance

IEEE Latin America Transactions ◽

10.1109/tla.2021.9448287 ◽

2021 ◽

Vol 19 (5) ◽

pp. 735-745

Author(s):

Wallace Faveron de Almeida ◽

Clodoaldo Aparecido de Moraes Lima ◽

Sarajane Marques Peres

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Neurological Diseases ◽

Selection Methods ◽

Systematic Mapping

Download Full-text

Extensive Survey on Feature Extraction and Feature Selection Techniques for Sentiment Classification in Social Media

2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT) ◽

10.1109/icccnt45670.2019.8944391 ◽

2019 ◽

Author(s):

S.Sathish Kumar ◽

Aruchamy Rajini

Keyword(s):

Social Media ◽

Feature Extraction ◽

Feature Selection ◽

Sentiment Classification ◽

Extensive Survey ◽

Feature Selection Techniques

Download Full-text

Design of Text Categorization System Based on SVM

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1191 ◽

2012 ◽

Vol 532-533 ◽

pp. 1191-1195 ◽

Cited By ~ 1

Author(s):

Zhen Yan Liu ◽

Wei Ping Wang ◽

Yong Wang

Keyword(s):

Feature Extraction ◽

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Extraction Methods ◽

Support Vector ◽

Text Representation ◽

Text Feature ◽

Categorization System ◽

Classifier Training

This paper introduces the design of a text categorization system based on Support Vector Machine (SVM). It analyzes the high dimensional characteristic of text data, the reason why SVM is suitable for text categorization. According to system data flow this system is constructed. This system consists of three subsystems which are text representation, classifier training and text classification. The core of this system is the classifier training, but text representation directly influences the currency of classifier and the performance of the system. Text feature vector space can be built by different kinds of feature selection and feature extraction methods. No research can indicate which one is the best method, so many feature selection and feature extraction methods are all developed in this system. For a specific classification task every feature selection method and every feature extraction method will be tested, and then a set of the best methods will be adopted.

Download Full-text