Developing Speech Recognition System for Quranic Verse Recitation Learning Software

Quran as holy book for Muslim consists of many rules which are needed to be considered in reading Quran verse properly. If the recitation does not meet all of those rules, the meaning of Quran verse recited will be different with its origins. Intensive learning is needed to be able to do correct recitation. However, the limitation of teachers and time to study Quran verse recitation together in a class could be an obstacle in Quran recitation learning. In order to minimize the obstacle and to ease the learning process we implement speech recognition techniques based on Mel Frequency Cepstral Coefficient (MFCC) features and Gaussian Mixture Model (GMM) modeling, we have successfully designed and developed Quran verse recitation learning software in prototype stage. This software is interactive multimedia software which has many features for learning flexibility and effectiveness. This paper explains the developing of speech recognition system for Quran learning software which is built with the ability to perform evaluation and correction in Quran recitation. In this paper, the authors present clearly the built and tested prototype of the system based on experiment data.

Download Full-text

Continuous Speech Recognition System for Kannada Language with Triphone Modelling using HTK

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5394.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 7827-7831

Keyword(s):

Speech Recognition ◽

Recognition Accuracy ◽

Gaussian Mixture ◽

Recognition System ◽

Experimental Result ◽

Speech Recognition System ◽

Continuous Speech Recognition ◽

Frequency Scale ◽

Context Dependent ◽

Mel Frequency Cepstral Coefficient

Kannada is the regional language of India spoken in Karnataka. This paper presents development of continuous kannada speech recognition system using monophone modelling and triphone modelling using HTK. Mel Frequency Cepstral Coefficient (MFCC) is used as feature extractor, exploits cepstral and perceptual frequency scale leads good recognition accuracy. Hidden Markov Model is used as classifier. In this paper Gaussian mixture splitting is done that captures the variations of the phones. The paper presents performance of continuous Kannada Automatic Speech Recognition (ASR) system with respect to 2, 4,8,16 and 32 Gaussian mixtures with monophone and context dependent tri-phone modelling. The experimental result shows that good recognition accuracy is achieved for context dependent tri-phone modelling than monophone modelling as the number Gaussian mixture is increased.

Download Full-text

A Gaussian Mixture Model Based Speech Recognition System Using Matlab

Signal & Image Processing An International Journal ◽

10.5121/sipij.2013.4409 ◽

2013 ◽

Vol 4 (4) ◽

pp. 109-118 ◽

Cited By ~ 5

Author(s):

Manan Vyas

Keyword(s):

Speech Recognition ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Recognition System ◽

Model Based

Download Full-text

Speech recognition system using enhanced mel frequency cepstral coefficient with windowing and framing method

Cluster Computing ◽

10.1007/s10586-017-1447-6 ◽

2017 ◽

Vol 22 (S5) ◽

pp. 11669-11679 ◽

Cited By ~ 3

Author(s):

S. Lokesh ◽

M. Ramya Devi

Keyword(s):

Speech Recognition ◽

Recognition System ◽

Speech Recognition System ◽

Mel Frequency Cepstral Coefficient

Download Full-text

Gaussian Mixture Clustering and Language Adaptation for the Development of a New Language Speech Recognition System

IEEE Transactions on Audio Speech and Language Processing ◽

10.1109/tasl.2006.885259 ◽

2007 ◽

Vol 15 (3) ◽

pp. 928-938 ◽

Cited By ~ 1

Author(s):

Nikos Chatzichrisafis ◽

Vassilios Diakoloukas ◽

Vassilios Digalakis ◽

Costas Harizakis

Keyword(s):

Speech Recognition ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Recognition System

Download Full-text

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00216-5 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Masoud Geravanchizadeh ◽

Elnaz Forouhandeh ◽

Meysam Bashirpour

Keyword(s):

Speech Recognition ◽

Automatic Speech Recognition ◽

Vocal Tract ◽

Gaussian Mixture ◽

Recognition System ◽

Speech Recognition System ◽

Emotional States ◽

Emotional Speech ◽

Automatic Speech Recognition System ◽

Frequency Warping

AbstractThe performance of speech recognition systems trained with neutral utterances degrades significantly when these systems are tested with emotional speech. Since everybody can speak emotionally in the real-world environment, it is necessary to take account of the emotional states of speech in the performance of the automatic speech recognition system. Limited works have been performed in the field of emotion-affected speech recognition and so far, most of the researches have focused on the classification of speech emotions. In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system. For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used. To achieve this goal, frequency warping is applied to the filterbank and/or discrete-cosine transform domain(s) in the feature extraction process of the automatic speech recognition system. The warping process is conducted in a way to normalize the emotional feature components and make them close to their corresponding neutral feature components. The performance of the proposed system is evaluated in neutrally trained/emotionally tested conditions for different speech features and emotional states (i.e., Anger, Disgust, Fear, Happy, and Sad). In this system, frequency warping is employed for different acoustical features. The constructed emotion-affected speech recognition system is based on the Kaldi automatic speech recognition with the Persian emotional speech database and the crowd-sourced emotional multi-modal actors dataset as the input corpora. The experimental simulations reveal that, in general, the warped emotional features result in better performance of the emotion-affected speech recognition system as compared with their unwarped counterparts. Also, it can be seen that the performance of the speech recognition using the deep neural network-hidden Markov model outperforms the system employing the hybrid with the Gaussian mixture model.

Download Full-text

Direct Recovery of Clean Speech Using a Hybrid Noise Suppression Algorithm for Robust Speech Recognition System

ISRN Signal Processing ◽

10.5402/2012/306305 ◽

2012 ◽

Vol 2012 ◽

pp. 1-9

Author(s):

Peng Dai ◽

Ing Yann Soon ◽

Rui Tao

Keyword(s):

Speech Recognition ◽

Noise Suppression ◽

Recognition Rate ◽

Nonlinear Function ◽

Recognition System ◽

Speech Recognition System ◽

Direct Solution ◽

Power Domain ◽

Discontinuity Problem ◽

Mel Frequency Cepstral Coefficient

A new log-power domain feature enhancement algorithm named NLPS is developed. It consists of two parts, direct solution of nonlinear system model and log-power subtraction. In contrast to other methods, the proposed algorithm does not need prior speech/noise statistical model. Instead, it works by direct solution of the nonlinear function derived from the speech recognition system. Separate steps are utilized to refine the accuracy of estimated cepstrum by log-power subtraction, which is the second part of the proposed algorithm. The proposed algorithm manages to solve the speech probability distribution function (PDF) discontinuity problem caused by traditional spectral subtraction series algorithms. The effectiveness of the proposed filter is extensively compared using the standard database, AURORA2. The results show that significant improvement can be achieved by incorporating the proposed algorithm. The proposed algorithm reaches a recognition rate of over 86% for noisy speech (average from SNR 0 dB to 20 dB), which means a 48% error reduction over the baseline Mel-frequency Cepstral Coefficient (MFCC) system.

Download Full-text