Effects of Syllable Language Model on Distinctive Phonetic Features (DPFs) based Phoneme Recognition Performance

2010 ◽  
Vol 5 (6) ◽  
Author(s):  
Mohammad Nurul Huda ◽  
Manoj Banik ◽  
Ghulam Muhammad ◽  
Mashud Kabir ◽  
Bernd J. Kröger
2019 ◽  
Vol 105 (6) ◽  
pp. 1269-1277 ◽  
Author(s):  
Yousef A. Alotaibi ◽  
Sid-Ahmed Selouani ◽  
Mohammed Sidi Yakoub ◽  
Yasser Mohammed Seddiq ◽  
Ali Meftah

The robustness of speech classification and recognition systems can be improved by the adoption of language distinctive phonetic feature (DPF) elements that can increase the effective characterization of a speech signal. This paper presents the results of applying Hidden Markov Models (HMMs) that perform Arabic phoneme recognition in conjunction with the inclusion and classification of their DPF element classes. The research focuses on classifying Modern Standard Arabic (MSA) phonemes within isolated words without a language context. HMM-based phoneme recognition is tested using 8, 16, and 32 HMM Gaussian mixture models. The monophone configuration is designed with consideration of 2-gram language model to evaluate the inherent performance of the system. The overall correct rates for classifying DPF element classes for the three versions of HMM systems are 83.29% 88.96%, and 92.70% for 8, 16, and 32 HMM Gaussian mixture model systems, respectively.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 200395-200411
Author(s):  
Ahmed B. Ibrahim ◽  
Yasser Mohammad Seddiq ◽  
Ali Hamid Meftah ◽  
Mansour Alghamdi ◽  
Sid-Ahmed Selouani ◽  
...  

Author(s):  
Mohammed Rokibul Alam Kotwal ◽  
Foyzul Hassan ◽  
Mohammad Nurul Huda

This chapter presents Bangla (widely known as Bengali) Automatic Speech Recognition (ASR) techniques by evaluating the different speech features, such as Mel Frequency Cepstral Coefficients (MFCCs), Local Features (LFs), phoneme probabilities extracted by time delay artificial neural networks of different architectures. Moreover, canonicalization of speech features is also performed for Gender-Independent (GI) ASR. In the canonicalization process, the authors have designed three classifiers by male, female, and GI speakers, and extracted the output probabilities from these classifiers for measuring the maximum. The maximization of output probabilities for each speech file provides higher correctness and accuracies for GI speech recognition. Besides, dynamic parameters (velocity and acceleration coefficients) are also used in the experiments for obtaining higher accuracy in phoneme recognition. From the experiments, it is also shown that dynamic parameters with hybrid features also increase the phoneme recognition performance in a certain extent. These parameters not only increase the accuracy of the ASR system, but also reduce the computation complexity of Hidden Markov Model (HMM)-based classifiers with fewer mixture components.


Author(s):  
Tetsuo Kosaka ◽  
Takashi Kusama ◽  
Masaharu Kato ◽  
Masaki Kohda

The aim of this work is to improve the recognition performance of spontaneous speech. In order to achieve the purpose, the authors of this chapter propose new approaches of unsupervised adaptation for spontaneous speech and evaluate the methods by using diagonal-covariance and full-covariance hidden Markov models. In the adaptation procedure, both methods of language model (LM) adaptation and acoustic model (AM) adaptation are used iteratively. Several combination methods are tested to find the optimal approach. In the LM adaptation, a word trigram model and a part-of-speech (POS) trigram model are combined to build a more task-specific LM. In addition, the authors propose an unsupervised speaker adaptation technique based on adaptation data weighting. The weighting is performed depending on POS class. In Japan, a large-scale spontaneous speech database “Corpus of Spontaneous Japanese (CSJ)” has been used as the common evaluation database for spontaneous speech and the authors used it for their recognition experiments. From the results, the proposed methods demonstrated a significant advantage in that task.


Author(s):  
O. FAROOQ ◽  
S. DATTA ◽  
M. C. SHROTRIYA

This paper proposes the use of wavelet transform-based feature extraction technique for Hindi speech recognition application. The new proposed features take into account temporal as well as frequency band energy variations for the task of Hindi phoneme recognition. The recognition performance achieved by the proposed features is compared with the standard MFCC and 24-band admissible wavelet packet-based features using a linear discriminant function based classifier. To evaluate robustness of these features, the NOISEX database is used to add different types of noise into phonemes to achieve signal-to-noise ratios in the range of 20 dB to -5 dB. The recognition results show that under noisy background the proposed technique always achieves a better performance over MFCC-based features.


Author(s):  
Mohamm Huda ◽  
Ghulam Muhammad ◽  
Mohammad Mahedi Hasan ◽  
Sharif Mohammad Musfiqur Rahman ◽  
Foyzul Hassan ◽  
...  

2012 ◽  
Vol 3 (1) ◽  
pp. 1-31 ◽  
Author(s):  
Svetlana Stoyanchev ◽  
Amanda J. Stent

Responsive adaptation in spoken dialog systems involves a change in dialog system behavior in response to a user or a dialog situation. In this paper we address responsive adaptation in the automatic speech recognition (ASR) module of a spoken dialog system. We hypothesize that information about the content of a user utterance may help improve speech recognition for the utterance. We use a two-step process to test this hypothesis: first, we automatically predict the task-relevant concept types likely to be present in a user utterance using features from the dialog context and from the output of first-pass ASR of the utterance; and then, we adapt the ASR's language model to the predicted content of the user's utterance and run a second pass of ASR. We show that: (1) it is possible to achieve high accuracy in determining presence or absence of particular concept types in a post-confirmation utterance; and (2) 2-pass speech recognition with concept type classification and language model adaptation can lead to improved speech recognition performance for post-confirmation utterances.


2014 ◽  
Vol 623 ◽  
pp. 267-273
Author(s):  
Xin Fei Liu ◽  
Hui Zhou

This paper describes a Chinese small-vocabulary offline speech recognition system based on PocketSphinx which acoustic models are regenerated by improving the existing models of Sphinx and language model is generated by LMTool online tool. And then build an offline speech recognition system which could run on the Android smartphone in Android development environment in Linux system. The experiment results show that the system used for recognizing the voice commands for cell phone has good recognition performance.


Sign in / Sign up

Export Citation Format

Share Document