The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition

The presented paper introduces principal component analysis application for dimensionality reduction of variables describing speech signal and applicability of obtained results for the disturbed and fluent speech recognition process. A set of fluent speech signals and three speech disturbances—blocks before words starting with plosives, syllable repetitions, and sound-initial prolongations—was transformed using principal component analysis. The result was a model containing four principal components describing analysed utterances. Distances between standardised original variables and elements of the observation matrix in a new system of coordinates were calculated and then applied in the recognition process. As a classifying algorithm, the multilayer perceptron network was used. Achieved results were compared with outcomes from previous experiments where speech samples were parameterised with the Kohonen network application. The classifying network achieved overall accuracy at 76% (from 50% to 91%, depending on the dysfluency type).

Download Full-text

Employing Robust Principal Component Analysis for Noise-Robust Speech Feature Extraction in Automatic Speech Recognition with the Structure of a Deep Neural Network

Applied System Innovation ◽

10.3390/asi1030028 ◽

2018 ◽

Vol 1 (3) ◽

pp. 28 ◽

Cited By ~ 4

Author(s):

Jeih-weih Hung ◽

Jung-Shan Lin ◽

Po-Jen Wu

Keyword(s):

Neural Network ◽

Principal Component Analysis ◽

Speech Recognition ◽

Automatic Speech Recognition ◽

Deep Neural Network ◽

Principal Component ◽

Component Analysis ◽

Robust Principal Component Analysis ◽

Speech Feature ◽

Noise Robust

In recent decades, researchers have been focused on developing noise-robust methods in order to compensate for noise effects in automatic speech recognition (ASR) systems and enhance their performance. In this paper, we propose a feature-based noise-robust method that employs a novel data analysis technique—robust principal component analysis (RPCA). In the proposed scenario, RPCA is employed to process a noise-corrupted speech feature matrix, and the obtained sparse partition is shown to reveal speech-dominant characteristics. One apparent advantage of using RPCA for enhancing noise robustness is that no prior knowledge about the noise is required. The proposed RPCA-based method is evaluated with the Aurora-4 database and a task using a state-of-the-art deep neural network (DNN) architecture as the acoustic models. The evaluation results indicate that the newly proposed method can provide the original speech feature with significant recognition accuracy improvement, and can be cascaded with mean normalization (MN), mean and variance normalization (MVN), and relative spectral (RASTA)—three well-known and widely used feature robustness algorithms—to achieve better performance compared with the individual component method.

Download Full-text