Dimensionality reduction, including feature extraction and selection, is one
of the key points for text classification. In this paper, we propose a mixed
method of dimensionality reduction constructed by principal components
analysis and the selection of components. Principal components analysis is a
method of feature extraction. Not all of the components in principal
component analysis contribute to classification, because PCA objective is
not a form of discriminant analysis (see, e.g. Jolliffe, 2002). In this
context, we present a function of components selection, which returns the
useful components for classification by the indicators of the performances
on the different subsets of the components. Compared to traditional methods
of feature selection, SVM classifiers trained on selected components show
improved classification performance and a reduction in computational
overhead.