Dimensionality reduction in text classification using scatter method

Author(s):  
Jyri Saarikoski ◽  
Jorma Laurikkala ◽  
Kalervo Järvelin ◽  
Markku Siermala ◽  
Martti Juhola
Filomat ◽  
2018 ◽  
Vol 32 (5) ◽  
pp. 1499-1506 ◽  
Author(s):  
Yangwu Zhang ◽  
Guohe Li ◽  
Heng Zong

Dimensionality reduction, including feature extraction and selection, is one of the key points for text classification. In this paper, we propose a mixed method of dimensionality reduction constructed by principal components analysis and the selection of components. Principal components analysis is a method of feature extraction. Not all of the components in principal component analysis contribute to classification, because PCA objective is not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this context, we present a function of components selection, which returns the useful components for classification by the indicators of the performances on the different subsets of the components. Compared to traditional methods of feature selection, SVM classifiers trained on selected components show improved classification performance and a reduction in computational overhead.


2020 ◽  
Vol 19 (04) ◽  
pp. 2050039
Author(s):  
Jorge Chamorro-Padial ◽  
Rosa Rodríguez-Sánchez

This paper proposes a new method of dimensionality reduction when performing Text Classification, by applying the discrete wavelet transform to the document-term frequencies matrix. We analyse the features provided by the wavelet coefficients from the different orientations: (1) The high energy coefficients in the horizontal orientation correspond to relevant terms in a single document. (2) The high energy coefficients in the vertical orientation correspond to relevant terms for a single document, but not for the others. (3) The high energy coefficients in the diagonal orientation correspond to relevant terms in a document in comparison to other terms. If we filter using the wavelet coefficients and fulfil these three conditions simultaneously, we can obtain a reduced vocabulary of the corpus, with less dimensions than in the original one. To test the success of the reduced vocabulary, we recoded the corpus with the new reduced vocabulary and we obtained a statistically relevant level of accuracy for document classification.


2012 ◽  
Vol 28 (2) ◽  
pp. 115-138 ◽  
Author(s):  
Richard A. McAllister ◽  
Rafal A. Angryk

Sign in / Sign up

Export Citation Format

Share Document