A Combined Feature Selection Method for Chinese Text Categorization

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This has led, in recent years, to a substantial amount of research in multi-label classification. More specifically, feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. This work presents a new feature selection method based on the lazy feature selection paradigm and specific for the multi-label context. Experimental results show that the proposed technique is competitive when compared to multi-label feature selection techniques currently used in the literature, and is clearly more scalable, in a scenario where there is an increasing amount of data.

Download Full-text

A HYBRID FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488507004492 ◽

2007 ◽

Vol 15 (02) ◽

pp. 133-151 ◽

Cited By ~ 2

Author(s):

E. MONTAÑÉS ◽

J. R. QUEVEDO ◽

E. F. COMBARRO ◽

I. DÍAZ ◽

J. RANILLA

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Hybrid Approach ◽

Feature Selection Method ◽

Selection Method ◽

Fast Method ◽

Evaluation Function ◽

Wrapper Approach ◽

Wrapper Method ◽

Filtering Approach

Feature Selection is an important task within Text Categorization, where irrelevant or noisy features are usually present, causing a lost in the performance of the classifiers. Feature Selection in Text Categorization has usually been performed using a filtering approach based on selecting the features with highest score according to certain measures. Measures of this kind come from the Information Retrieval, Information Theory and Machine Learning fields. However, wrapper approaches are known to perform better in Feature Selection than filtering approaches, although they are time-consuming and sometimes infeasible, especially in text domains. However a wrapper that explores a reduced number of feature subsets and that uses a fast method as evaluation function could overcome these difficulties. The wrapper presented in this paper satisfies these properties. Since exploring a reduced number of subsets could result in less promising subsets, a hybrid approach, that combines the wrapper method and some scoring measures, allows to explore more promising feature subsets. A comparison among some scoring measures, the wrapper method and the hybrid approach is performed. The results reveal that the hybrid approach outperforms both the wrapper approach and the scoring measures, particularly for corpora whose features are less scattered over the categories.

Download Full-text

Research on feature selection method in Chinese text automatic classification

Energy Science and Applied Technology ◽

10.1201/b19779-83 ◽

2015 ◽

pp. 359-361

Author(s):

Ying Hong ◽

Zengmin Geng

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Feature Selection Method ◽

Automatic Classification ◽

Selection Method

Download Full-text

A HowNet-based Feature Selection Method for Chinese Text Representation

2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery ◽

10.1109/fskd.2009.280 ◽

2009 ◽

Cited By ~ 3

Author(s):

Changwei Zhao ◽

Xueli Yao ◽

Suhuan Sun

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Feature Selection Method ◽

Selection Method ◽

Text Representation

Download Full-text

Research on Feature Selection Method in Chinese Text Automatic Classification

Proceedings of the 2015 International conference on Applied Science and Engineering Innovation ◽

10.2991/asei-15.2015.349 ◽

2015 ◽

Author(s):

Ying Hong ◽

Xiwen Shao

Keyword(s):

Feature Selection ◽

Chinese Text ◽

Feature Selection Method ◽

Automatic Classification ◽

Selection Method

Download Full-text

Penerapan Ensemble Feature Selection dan Klasterisasi Fitur pada Klasifikasi Dokumen Teks

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v4i1.2745 ◽

2013 ◽

Vol 4 (1) ◽

pp. 333

Author(s):

Mediana Aryuni

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Text Categorization ◽

Feature Selection Method ◽

Selection Method ◽

Majority Voting ◽

Iterative Refinement ◽

Ensemble Method ◽

Computational Time ◽

Feature Clustering

An ensemble method is an approach where several classifiers are created from the training data which can be often more accurate than any of the single classifiers, especially if the base classifiers are accurate and different one each other. Menawhile, feature clustering can reduce feature space by joining similar words into one cluster. The objective of this research is to develop a text categorization system that employs feature clustering based on ensemble feature selection. The research methodology consists of text documents preprocessing, feature subspaces generation using the genetic algorithm-based iterative refinement, implementation of base classifiers by applying feature clustering, and classification result integration of each base classifier using both the static selection and majority voting methods. Experimental results show that the computational time consumed in classifying the dataset into 2 and 3 categories using the feature clustering method is 1.18 and 27.04 seconds faster in compared to those that do not employ the feature selection method, respectively. Also, using static selection method, the ensemble feature selection method with genetic algorithm-based iterative refinement produces 10% and 10.66% better accuracy in compared to those produced by the single classifier in classifying the dataset into 2 and 3 categories, respectively. Whilst, using the majority voting method for the same experiment, the similar ensemble method produces 10% and 12% better accuracy than those produced by the single classifier, respectively.

Download Full-text