scholarly journals A Novel Statistical Feature Selection Approach for Text Categorization

Author(s):  
Mohamed Abdel Fattah
2014 ◽  
Vol 45 ◽  
pp. 1-10 ◽  
Author(s):  
Deqing Wang ◽  
Hui Zhang ◽  
Rui Liu ◽  
Weifeng Lv ◽  
Datao Wang

Author(s):  
Mohammad Mojaveriyan ◽  
◽  
Hossein Ebrahimpour-komleh ◽  
Seyed jalaleddin Mousavirad

2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Jianzhong Wang ◽  
Shuang Zhou ◽  
Yugen Yi ◽  
Jun Kong

Feature selection is a key issue in the domain of machine learning and related fields. The results of feature selection can directly affect the classifier’s classification accuracy and generalization performance. Recently, a statistical feature selection method named effective range based gene selection (ERGS) is proposed. However, ERGS only considers the overlapping area (OA) among effective ranges of each class for every feature; it fails to handle the problem of the inclusion relation of effective ranges. In order to overcome this limitation, a novel efficient statistical feature selection approach called improved feature selection based on effective range (IFSER) is proposed in this paper. In IFSER, an including area (IA) is introduced to characterize the inclusion relation of effective ranges. Moreover, the samples’ proportion for each feature of every class in both OA and IA is also taken into consideration. Therefore, IFSER outperforms the original ERGS and some other state-of-the-art algorithms. Experiments on several well-known databases are performed to demonstrate the effectiveness of the proposed method.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Hongfang Zhou ◽  
Jie Guo ◽  
Yinghui Wang ◽  
Minghua Zhao

Feature selection plays a critical role in text categorization. During feature selecting, high-frequency terms and the interclass and intraclass relative contributions of terms all have significant effects on classification results. So we put forward a feature selection approach, IIRCT, based on interclass and intraclass relative contributions of terms in the paper. In our proposed algorithm, three critical factors, which are term frequency and the interclass relative contribution and the intraclass relative contribution of terms, are all considered synthetically. Finally, experiments are made with the help of kNN classifier. And the corresponding results on 20 NewsGroup and SougouCS corpora show that IIRCT algorithm achieves better performance than DF,t-Test, and CMFS algorithms.


2009 ◽  
Vol 29 (7) ◽  
pp. 1755-1757
Author(s):  
Zhong-yang XIONG ◽  
Jian JIANG ◽  
Yu-fang ZHANG

Sign in / Sign up

Export Citation Format

Share Document