scholarly journals A new framework based on KNN and DT for speech identification through emphatic letters in Moroccan dialect

Author(s):  
Bezoui Mouaz ◽  
Cherif Walid ◽  
Beni-Hssane Abderrahim ◽  
Elmoutaouakkil Abdelmajid

<p class="keywords"><span id="docs-internal-guid-6347807a-7fff-e7da-a2d6-74cb8393677f"><span>Arabic dialects differ substantially from modern standard arabic and each other in terms of phonology, morphology, lexical choice and syntax. This makes the identification of dialects from speeches a very difficult task. In this paper, we introduce a speech recognition system that automatically identifies the gender of speaker, the emphatic letter pronounced and also the diacritic of these emphatic letters given a sample of author’s speeches. Firstly we examined the performance of the single case classifier hidden markov models (HMM) applied to the samples of our data corpus. Then we evaluated our proposed approach KNN-DT which is a hybridization of two classifiers namely decision trees (DT) and k-nearest neighbors (KNN). Both models are singularly applied directly to the data corpus to recognize the emphatic letter of the sound and to the diacritic and the gender of the speaker. This hybridization proved quite interesting; it improved the speech recognition accuracy by more than 10% compared to state-of-the-art approaches.</span></span></p>

1999 ◽  
Vol 08 (01) ◽  
pp. 53-71
Author(s):  
EMDAD KHAN ◽  
ROBERT LEVINSON

In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (Hidden Markov Models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Multiple views (we also call these scrambles) can be used at different stages in the recognition processes. This paper explore these possibilities. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. The accuracy of the noise rejection-based approach is further improved by using Higher Level Decision Making (HLD) - our method for data fusion. HLD improves accuracy by efficiently resolving conflicts. The key approaches taken for HLD are: meta reasoning, single cycle training (SCT), confidence factors and view minimization. Our tests show very encouraging results.


Sign in / Sign up

Export Citation Format

Share Document