scholarly journals Model adaptation method for recognition of speech with missing frames

2014 ◽  
Vol 135 (3) ◽  
pp. EL166-EL171 ◽  
Author(s):  
Lee-Min Lee ◽  
Fu-Rong Jean
Author(s):  
Chung-Hsien Wu ◽  
Hung-Yu Su ◽  
Chao-Hong Liu

This chapter presents an efficient approach to personalized pronunciation assessment of Taiwanese-accented English. The main goal of this study is to detect frequently occurring mispronunciation patterns of Taiwanese-accented English instead of scoring English pronunciations directly. The proposed assessment help quickly discover personalized mispronunciations of a student, thus English teachers can spend more time on teaching or rectifying students’ pronunciations. In this approach, an unsupervised model adaptation method is performed on the universal acoustic models to recognize the speech of a specific speaker with mispronunciations and Taiwanese accent. A dynamic sentence selection algorithm, considering the mutual information of the related mispronunciations, is proposed to choose a sentence containing the most undetected mispronunciations in order to quickly extract personalized mispronunciations. The experimental results show that the proposed unsupervised adaptation approach obtains an accuracy improvement of about 2.1% on the recognition of Taiwanese-accented English speech.


Author(s):  
HEUNGKYU LEE ◽  
JUNE KIM

This paper proposes the online noise model adaptation technique using the modified quantile based noise estimation method for feature compensation of noisy speech that is based on the Gaussian mixture model for a robust speech recognition interface in real car environments. The proposed method is designed for an active online model adaptation method to cope with varying environmental noise conditions, and enhance speech recognition accuracy. This method is compensated on logarithmic filter-bank energies domain, and modified quantile based noise estimation method using beta-order harmonic mean is employed to the online noise estimation procedure. Experimental evaluation is done by using Aurora 2 speech database, and robust results were obtained than from other comparative algorithms.


Author(s):  
Ignacio Viñals ◽  
Alfonso Ortega ◽  
Jesús Villalba ◽  
Antonio Miguel ◽  
Eduardo Lleida

AbstractWe present a novel model adaptation approach to deal with data variability for speaker diarization in a broadcast environment. Expensive human annotated data can be used to mitigate the domain mismatch by means of supervised model adaptation approaches. By contrast, we propose an unsupervised adaptation method which does not need for in-domain labeled data but only the recording that we are diarizing. We rely on an inner adaptation block which combines Agglomerative Hierarchical Clustering (AHC) and Mean-Shift (MS) clustering techniques with a Fully Bayesian Probabilistic Linear Discriminant Analysis (PLDA) to produce pseudo-speaker labels suitable for model adaptation. We propose multiple adaptation approaches based on this basic block, including unsupervised and semi-supervised. Our proposed solutions, analyzed with the Multi-Genre Broadcast 2015 (MGB) dataset, reported significant improvements (16% relative improvement) with respect to the baseline, also outperforming a supervised adaptation proposal with low resources (9% relative improvement). Furthermore, our proposed unsupervised adaptation is totally compatible with a supervised one. The joint use of both adaptation techniques (supervised and unsupervised) shows a 13% relative improvement with respect to only considering the supervised adaptation.


Sign in / Sign up

Export Citation Format

Share Document