An Experimental Study of Feature Selection Methods for Text Classification

As the volume of online short text documents grow tremendously on the Internet, it is much more urgent to solve the task of organizing the short texts well. However, the traditional feature selection methods cannot suitable for the short text. In this paper, we proposed a method to incorporate syntactic information for the short text. It emphasizes the feature which has more dependency relations with other words. The classifier SVM and machine learning environment Weka are involved in our experiments. The experiment results show that incorporate syntactic information in the short text, we can get more powerful features than traditional feature selection methods, such as DF, CHI. The precision of short text classification improved from 86.2% to 90.8%.

Download Full-text

Review of feature selection methods for text classification

International Journal of Advanced Computer Research ◽

10.19101/ijacr.2020.1048037 ◽

2020 ◽

Vol 10 (49) ◽

pp. 138-152

Author(s):

Muhammad Iqbal ◽

Malik Muneeb Abid ◽

Muhammad Noman Khalid ◽

Amir Manzoor

Keyword(s):

Feature Selection ◽

Text Classification ◽

Selection Methods

Download Full-text

Impact of feature selection techniques in Text Classification: An Experimental study

JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES ◽

10.26782/jmcms.spl.3/2019.09.00004 ◽

2019 ◽

Vol 1 (3) ◽

Author(s):

S Rahamat Basha

Keyword(s):

Experimental Study ◽

Feature Selection ◽

Text Classification ◽

Feature Selection Techniques

Download Full-text

Improving Classification of Protein Interaction Articles Using Context Similarity-Based Feature Selection

BioMed Research International ◽

10.1155/2015/751646 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 1

Author(s):

Yifei Chen ◽

Yuxing Sun ◽

Bing-Qing Han

Keyword(s):

Feature Selection ◽

Protein Interaction ◽

Text Classification ◽

Protein Interactions ◽

Reduction Rate ◽

Importance Measure ◽

Context Information ◽

Selection Methods ◽

Term Frequency ◽

Context Similarity

Protein interaction article classification is a text classification task in the biological domain to determine which articles describe protein-protein interactions. Since the feature space in text classification is high-dimensional, feature selection is widely used for reducing the dimensionality of features to speed up computation without sacrificing classification performance. Many existing feature selection methods are based on the statistical measure of document frequency and term frequency. One potential drawback of these methods is that they treat features separately. Hence, first we design a similarity measure between the context information to take word cooccurrences and phrase chunks around the features into account. Then we introduce the similarity of context information to the importance measure of the features to substitute the document and term frequency. Hence we propose new context similarity-based feature selection methods. Their performance is evaluated on two protein interaction article collections and compared against the frequency-based methods. The experimental results reveal that the context similarity-based methods perform better in terms of theF1measure and the dimension reduction rate. Benefiting from the context information surrounding the features, the proposed methods can select distinctive features effectively for protein interaction article classification.

Download Full-text

Experimental study on feature selection methods for software fault detection

2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT) ◽

10.1109/iccpct.2016.7530156 ◽

2016 ◽

Cited By ~ 2

Author(s):

D. Asir Antony Gnana Singh ◽

A. Escalin Fernando ◽

E. Jebamalar Leavline

Keyword(s):

Experimental Study ◽

Feature Selection ◽

Fault Detection ◽

Selection Methods ◽

Software Fault

Download Full-text

Evaluation of feature selection methods for text classification with small datasets using multiple criteria decision-making methods

Applied Soft Computing ◽

10.1016/j.asoc.2019.105836 ◽

2020 ◽

Vol 86 ◽

pp. 105836 ◽

Cited By ~ 56

Author(s):

Gang Kou ◽

Pei Yang ◽

Yi Peng ◽

Feng Xiao ◽

Yang Chen ◽

...

Keyword(s):

Decision Making ◽

Feature Selection ◽

Text Classification ◽

Multiple Criteria Decision Making ◽

Multiple Criteria ◽

Selection Methods

Download Full-text

A NEW FEATURE SELECTION METHOD FOR TEXT CLASSIFICATION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001407005466 ◽

2007 ◽

Vol 21 (02) ◽

pp. 423-438 ◽

Cited By ~ 9

Author(s):

GULDEN UCHYIGIT ◽

KEITH CLARK

Keyword(s):

Feature Selection ◽

Text Classification ◽

Information Gain ◽

Feature Selection Method ◽

Feature Space ◽

Selection Method ◽

Computational Time ◽

Small Subset ◽

Selection Methods ◽

New Feature

Text classification is the problem of classifying a set of documents into a pre-defined set of classes. A major problem with text classification problems is the high dimensionality of the feature space. Only a small subset of these words are feature words which can be used in determining a document's class, while the rest adds noise and can make the results unreliable and significantly increase computational time. A common approach in dealing with this problem is feature selection where the number of words in the feature space are significantly reduced. In this paper we present the experiments of a comparative study of feature selection methods used for text classification. Ten feature selection methods were evaluated in this study including the new feature selection method, called the GU metric. The other feature selection methods evaluated in this study are: Chi-Squared (χ2) statistic, NGL coefficient, GSS coefficient, Mutual Information, Information Gain, Odds Ratio, Term Frequency, Fisher Criterion, BSS/WSS coefficient. The experimental evaluations show that the GU metric obtained the best F1 and F2 scores. The experiments were performed on the 20 Newsgroups data sets with the Naive Bayesian Probabilistic Classifier.

Download Full-text

Research on N-grams feature selection methods for text classification

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1031/1/012048 ◽

2021 ◽

Vol 1031 (1) ◽

pp. 012048

Author(s):

Tsvetanka Georgieva-Trifonova ◽

Mahmut Duraku

Keyword(s):

Feature Selection ◽

Text Classification ◽

Selection Methods

Download Full-text