Text Dimensionality Reduction for Document Clustering Using Hybrid Memetic Feature Selection

Author(s):  
Ibraheem Al-Jadir ◽  
Kok Wai Wong ◽  
Chun Che Fung ◽  
Hong Xie
2011 ◽  
pp. 279-279
Author(s):  
Geoffrey I. Webb ◽  
Johannes Fürnkranz ◽  
Johannes Fürnkranz ◽  
Johannes Fürnkranz ◽  
Geoffrey Hinton ◽  
...  

2021 ◽  
Vol 17 (2) ◽  
pp. 1-20
Author(s):  
Zheng Wang ◽  
Qiao Wang ◽  
Tingzhang Zhao ◽  
Chaokun Wang ◽  
Xiaojun Ye

Feature selection, an effective technique for dimensionality reduction, plays an important role in many machine learning systems. Supervised knowledge can significantly improve the performance. However, faced with the rapid growth of newly emerging concepts, existing supervised methods might easily suffer from the scarcity and validity of labeled data for training. In this paper, the authors study the problem of zero-shot feature selection (i.e., building a feature selection model that generalizes well to “unseen” concepts with limited training data of “seen” concepts). Specifically, they adopt class-semantic descriptions (i.e., attributes) as supervision for feature selection, so as to utilize the supervised knowledge transferred from the seen concepts. For more reliable discriminative features, they further propose the center-characteristic loss which encourages the selected features to capture the central characteristics of seen concepts. Extensive experiments conducted on various real-world datasets demonstrate the effectiveness of the method.


Author(s):  
Edilson Delgado-Trejos ◽  
Germán Castellanos ◽  
Luis G. Sánchez ◽  
Julio F. Suárez

Dimensionality reduction procedures perform well on sets of correlated features while variable selection methods perform poorly. These methods fail to pick relevant variables because the score they assign to correlated features is too similar, and none of the variables is strongly preferred over another. Hence, feature selection and dimensionality reduction algorithms have complementary advantages and disadvantages. Dimensionality reduction algorithms thrive on the correlation between variables but fail to select informative features from a set of more complex features. Variable selection algorithms fail when all the features are correlated, but succeed with informative variables (Wolf & Bileschi, 2005). In this work, we propose a feature selection algorithm with heuristic search that uses Multivariate Analysis of Variance (MANOVA) as the cost function. This technique is put to the test by classifying hypernasal from normal voices of CLP (cleft lip and/or palate) patients. The classification performance, computational time, and reduction ratio are also considered by comparing with an alternate feature selection method based on the unfolding of multivariate analysis into univariate and bivariate analysis. The methodology is effective because it has in mind the statistical and geometrical relevance present in the features, which does not summarize the analysis of the separability among classes, but searches a quality level in signal representation.


Sign in / Sign up

Export Citation Format

Share Document