A Normalized Least Mean Square and Dynamic Time Warping (DTW) Algorithm for an Intelligent Quran Tutoring System

Al-Quran is the most recited holy book in the Arabic language. Over 1.3-billion Muslim all over the world have an obligation to recite and learn Al-Quran. Learners from non-Arabic as well as from Arabic speaking communities face difficulties with Al-Quran recitation in the absence of a teacher (ustad) around. Advancement in speech recognition technology creates possible solutions to develop a system that has a capability to auricularly discern and validate the recitation. This paper investigates the speech recognition accuracy of template-based acoustic models and propose enhancement methods to improve the accuracy. A new scheme consists of enhancement of Normalized Least Mean Square (NLMS) and Dynamic Time Warping (DTW) algorithms have been proposed. The performance of the speech recognition accuracy was further improved by incorporating an adaptive optimal filtering with modified humming window for MFCC (Mel-frequency cepstral coefficients) using matching technique dynamic programming (DP), DTW (Dynamic Time Wrapping). The proposed scheme increases 5.5% of relative improvement in recognition accuracy achieved over conventional speech recognition process.

Download Full-text

Inversion of speech by non-linear transformation of temporary

Health Promotion & Physical Activity ◽

10.5604/01.3001.0010.7714 ◽

2016 ◽

Vol 1 (1) ◽

pp. 139-150

Author(s):

Robert Wielgat ◽

Anita Lorenc

Keyword(s):

Dynamic Time Warping ◽

Mean Square ◽

Time Warping ◽

Mel Frequency Cepstral Coefficients ◽

Precise Method ◽

Electromagnetic Articulography ◽

Acoustic Speech Signal ◽

Preliminary Research ◽

Dynamic Time ◽

Mean Square Errors

Electromagnetic Articulography (EMA) is a precise method for speech articulators assessment which is carried out by sensors placed mainly on the tongue. Various methods are being developed in order to avoid the assessment by EMA sensors. One of them is speech inversion. Here preliminary research on speech inversion based on dynamic time warping (DTW) method has been described. Mel-frequency cepstral coefficients (MFCC) method has been chosen as the acoustic speech signal parametrization method. Root mean square errors (RMSE) of the evaluation have been presented and discussed.

Download Full-text

Environmental effects on reliability and accuracy of MFCC based voice recognition for industrial human-robot-interaction

Proceedings of the Institution of Mechanical Engineers Part B Journal of Engineering Manufacture ◽

10.1177/09544054211014492 ◽

2021 ◽

pp. 095440542110144

Author(s):

B Birch ◽

CA Griffiths ◽

A Morgan

Keyword(s):

Speech Recognition ◽

Voice Recognition ◽

Human Robot Interaction ◽

Hole Drilling ◽

Time Warping ◽

Mel Frequency Cepstral Coefficients ◽

Robot Interaction ◽

Extraction Algorithm ◽

Dynamic Time ◽

Manufacturing Environments

Collaborative robots are becoming increasingly important for advanced manufacturing processes. The purpose of this paper is to determine the capability of a novel Human-Robot-interface to be used for machine hole drilling. Using a developed voice activation system, environmental factors on speech recognition accuracy are considered. The research investigates the accuracy of a Mel Frequency Cepstral Coefficients-based feature extraction algorithm which uses Dynamic Time Warping to compare an utterance to a limited, user-dependent dictionary. The developed Speech Recognition method allows for Human-Robot-Interaction using a novel integration method between the voice recognition and robot. The system can be utilised in many manufacturing environments where robot motions can be coupled to voice inputs rather than using time consuming physical interfaces. However, there are limitations to uptake in industries where the volume of background machine noise is high.

Download Full-text

Dynamic Time Warping based speech recognition for isolated sinhala words

2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2012.6292164 ◽

2012 ◽

Cited By ~ 2

Author(s):

P. G. N. Priyadarshani ◽

N. G. J. Dias ◽

Amal Punchihewa

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

Merge-Weighted Dynamic Time Warping for Speech Recognition

Journal of Computer Science and Technology ◽

10.1007/s11390-014-1491-0 ◽

2014 ◽

Vol 29 (6) ◽

pp. 1072-1082 ◽

Cited By ~ 4

Author(s):

Xiang-Lilan Zhang ◽

Zhi-Gang Luo ◽

Ming Li

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

An investigation of the use of dynamic time warping for word spotting and connected speech recognition

10.1109/icassp.1980.1171067 ◽

2005 ◽

Cited By ~ 15

Author(s):

C. Myers ◽

L. Rabiner ◽

A. Rosenberg

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Connected Speech ◽

Word Spotting ◽

Dynamic Time

Download Full-text

Vietnamese speech recognition using Dynamic Time Warping and Coefficient of Correlation

2013 International Conference on Control, Automation and Information Sciences (ICCAIS) ◽

10.1109/iccais.2013.6720531 ◽

2013 ◽

Author(s):

Vu Duc Lung ◽

Vu N. Truong

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Coefficient Of Correlation ◽

Dynamic Time

Download Full-text

An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

Jurnal Teknologi ◽

10.11113/jt.v46.291 ◽

2012 ◽

Author(s):

Rubita Sudirman ◽

Sh. Hussain Salleh ◽

Shaharuddin Salleh

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Conjugate Gradient ◽

Template Matching ◽

Dynamic Time Warping ◽

Gradient Descent ◽

Reference Frames ◽

Time Warping ◽

Dynamic Time ◽

Quasi Newton

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent

Download Full-text

FIELD-PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF THE DYNAMIC TIME WARPING ALGORITHM FOR SPEECH RECOGNITION

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19753 ◽

2017 ◽

Vol 10 (13) ◽

pp. 248

Author(s):

John Sahaya Rani Alex ◽

Mitali Bhojwani

Keyword(s):

Speech Recognition ◽

Field Programmable Gate Array ◽

Dynamic Time Warping ◽

Recognition Algorithm ◽

Consumer Electronics ◽

Time Warping ◽

Time To Market ◽

Computing Power ◽

Field Programmable ◽

Dynamic Time

Objective of this research is to implement a speech recognition algorithm in smaller form factor device. Speech recognition is an extensively used inmobile and in numerous consumer electronics devices. Dynamic time warping (DTW) method which is based on dynamic programming is chosen tobe implemented for speech recognition because of the latest trend in evolving computing power. Implementation of DTW in field-programmable gatearray is chosen for its featured flexibility, parallelization and shorter time to market. The above algorithm is implemented using Verilog on Xilinx ISE.The warping cost is less if the similarity is found and is more for dissimilar sequences which is verified in the simulation output. The results indicatethat real time implementation of DTW based speech recognition could be done in future.

Download Full-text

Syllable based Turkish speech recognition using Dynamic Time Warping and Multilayer Perceptron

2008 IEEE 16th Signal Processing, Communication and Applications Conference ◽

10.1109/siu.2008.4632732 ◽

2008 ◽

Cited By ~ 1

Author(s):

Rifat Asliyan ◽

Korhan Gunel ◽

Tatyana Yakhno

Keyword(s):

Speech Recognition ◽

Multilayer Perceptron ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

Analisis Speaker Recognition Menggunakan Metode Dynamic Time Warping (DTW) Berbasis Matlab

AVITEC ◽

10.28989/avitec.v1i1.492 ◽

2019 ◽

Vol 1 (1) ◽

Author(s):

Noor Fita Indri Prayoga

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Recognition Accuracy ◽

Recognition System ◽

Extraction Process ◽

Test Results ◽

Time Warping ◽

Dynamic Time ◽

The Voice

Voice is one of way to communicate and express yourself. Speaker recognition is a process carried out by a device to recognize the speaker through the voice. This study designed a speaker recognition system that was able to identify speakers based on what was said by using dynamic time warping (DTW) method based in matlab. To design a speaker recognition system begins with the process of reference data and test data. Both processes have the same process, which starts with sound recording, preprocessing, and feature extraction. In this system, the Fast Fourier Transform (FFT) method is used to extract the features. The results of the feature extraction process from the two data will be compared using the DTW method. Calculations using DTW that produce the smallest value will be determined as the output. The test results show that the system can identify the voice with the best level of recognition accuracy of 90%, and the average recognition accuracy of 80%. The results were obtained from 50 tests, carried out by 5 people consisting of 3 men and 2 women, each speaker said a predetermined word

Download Full-text