Building Sequence Kernels for Speaker Verification and Word Recognition

2011 ◽

pp. 246-262

Author(s):

Vincent Wan

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Kernel Methods ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Speaker Verification ◽

Dimensional Space ◽

Time Warping ◽

Recognition Systems ◽

Dynamic Time

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.

Download Full-text

Design of Speaker Verification using Dynamic Time Warping (DTW) on Graphical Programming for Authentication Process

Journal of Information Technology and Computer Science ◽

10.25126/jitecs.20172124 ◽

2017 ◽

Vol 2 (1) ◽

Author(s):

Barlian Henryranu Prasetio ◽

Dahnial Syauqy

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Speaker Verification ◽

Time Warping ◽

Graphical Programming ◽

Human Voice ◽

Authentication System ◽

Dynamic Time

Authentication is generally required on systems which need safety and privacy. In common, typed username and password are used and applied in authentication system. However, this type of authentication has been identified to have many weaknesses. In order to overcome the problem, many proposed authentication system based on voice as unique characteristics of human. We implement Dynamic Time Warping algorithm to compare human voice with reference voice as the authentication process. The testing results show that the system accuracy of the speech recognition average is 86.785%.

Download Full-text

Speaker recognition based on dynamic time warping and Gaussian mixture model

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9188632 ◽

2020 ◽

Author(s):

Nannan Zhang ◽

Yanru Yao

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Gaussian Mixture ◽

Time Warping ◽

Dynamic Time

Download Full-text

A Low-Power Text-Dependent Speaker Verification System with Narrow-Band Feature Pre-Selection and Weighted Dynamic Time Warping

10.21437/odyssey.2016-1 ◽

2016 ◽

Author(s):

Qing He ◽

Gregory Wornell ◽

Wei Ma

Keyword(s):

Low Power ◽

Dynamic Time Warping ◽

Narrow Band ◽

Speaker Verification ◽

Time Warping ◽

Verification System ◽

Dynamic Time ◽

Text Dependent Speaker Verification

Download Full-text

Dynamic Time Warping based speech recognition for isolated sinhala words

2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2012.6292164 ◽

2012 ◽

Cited By ~ 2

Author(s):

P. G. N. Priyadarshani ◽

N. G. J. Dias ◽

Amal Punchihewa

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

Merge-Weighted Dynamic Time Warping for Speech Recognition

Journal of Computer Science and Technology ◽

10.1007/s11390-014-1491-0 ◽

2014 ◽

Vol 29 (6) ◽

pp. 1072-1082 ◽

Cited By ~ 4

Author(s):

Xiang-Lilan Zhang ◽

Zhi-Gang Luo ◽

Ming Li

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Dynamic Time

Download Full-text

An investigation of the use of dynamic time warping for word spotting and connected speech recognition

10.1109/icassp.1980.1171067 ◽

2005 ◽

Cited By ~ 15

Author(s):

C. Myers ◽

L. Rabiner ◽

A. Rosenberg

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Connected Speech ◽

Word Spotting ◽

Dynamic Time

Download Full-text

Vietnamese speech recognition using Dynamic Time Warping and Coefficient of Correlation

2013 International Conference on Control, Automation and Information Sciences (ICCAIS) ◽

10.1109/iccais.2013.6720531 ◽

2013 ◽

Author(s):

Vu Duc Lung ◽

Vu N. Truong

Keyword(s):

Speech Recognition ◽

Dynamic Time Warping ◽

Time Warping ◽

Coefficient Of Correlation ◽

Dynamic Time

Download Full-text

An Improved Method In Speech Signal Input Representation Based On DTW Technique For NN Speech Recognition System

Jurnal Teknologi ◽

10.11113/jt.v46.291 ◽

2012 ◽

Author(s):

Rubita Sudirman ◽

Sh. Hussain Salleh ◽

Shaharuddin Salleh

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Conjugate Gradient ◽

Template Matching ◽

Dynamic Time Warping ◽

Gradient Descent ◽

Reference Frames ◽

Time Warping ◽

Dynamic Time ◽

Quasi Newton

Kertas kerja ini membentangkan pemprosesan semula ciri pertuturan pemalar Pengekodan Ramalan Linear (LPC) bagi menyediakan template rujukan yang boleh diharapkan untuk set perkataan yang hendak dicam menggunakan rangkaian neural buatan. Kertas kerja ini juga mencadangkan penggunaan cirian kenyaringan yang ditakrifkan dari data pertuturan sebagai satu lagi ciri input. Algoritma Warping Masa Dinamik (DTW) menjadi asas kepada algoritma baru yang dibangunkan, ia dipanggil sebagai DTW padanan bingkai (DTW–FF). Algoritma ini direka bentuk untuk melakukan padanan bingkai bagi pemprosesan semula input LPC. Ia bertujuan untuk menyamakan bilangan bingkai input dalam set ujian dengan set rujukan. Pernormalan bingkaian ini adalah diperlukan oleh rangkaian neural yang direka untuk membanding data yang harus mempunyai kepanjangan yang sama, sedangkan perkataan yang sama dituturkan dengan kepanjangan yang berbeza–beza. Dengan melakukan padanan bingkai, bingkai input dan rujukan boleh diubahsuai supaya bilangan bingkaian sama seperti bingkaian rujukan. Satu lagi misi kertas kerja ini ialah mentakrif dan menggunakan cirian kenyaringan menggunakan algoritma penapis harmonik. Selepas kenyaringan ditakrif dan pemalar LPC dinormalkan kepada bilangan bingkaian dikehendaki, pengecaman pertuturan menggunakan rangkaian neural dilakukan. Keputusan yang baik diperoleh sehingga mencapai ketepatan setinggi 98% menggunakan kombinasi cirian DTW–FF dan cirian kenyaringan. Di akhir kertas kerja ini, perbandingan kadar convergence antara Conjugate gradient descent (CGD), Quasi–Newton, dan Steepest Gradient Descent (SGD) dilakukan untuk mendapatkan arah carian titik global yang optimal. Keputusan menunjukkan CGD memberikan nilai titik global yang paling optimal dibandingkan dengan Quasi–Newton dan SGD. Kata kunci: Warping masa dinamik, pernormalan masa, rangkaian neural, pengecaman pertuturan, conjugate gradient descent A pre–processing of linear predictive coefficient (LPC) features for preparation of reliable reference templates for the set of words to be recognized using the artificial neural network is presented in this paper. The paper also proposes the use of pitch feature derived from the recorded speech data as another input feature. The Dynamic Time Warping algorithm (DTW) is the back–bone of the newly developed algorithm called DTW fixing frame algorithm (DTW–FF) which is designed to perform template matching for the input preprocessing. The purpose of the new algorithm is to align the input frames in the test set to the template frames in the reference set. This frame normalization is required since NN is designed to compare data of the same length, however same speech varies in their length most of the time. By doing frame fixing, the input frames and the reference frames are adjusted to the same number of frames according to the reference frames. Another task of the study is to extract pitch features using the Harmonic Filter algorithm. After pitch extraction and linear predictive coefficient (LPC) features fixed to a desired number of frames, speech recognition using neural network can be performed and results showed a very promising solution. Result showed that as high as 98% recognition can be achieved using combination of two features mentioned above. At the end of the paper, a convergence comparison between conjugate gradient descent (CGD), Quasi–Newton, and steepest gradient descent (SGD) search direction is performed and results show that the CGD outperformed the Newton and SGD. Key words: Dynamic time warping, time normalization, neural network, speech recognition, conjugate gradient descent

Download Full-text

FIELD-PROGRAMMABLE GATE ARRAY IMPLEMENTATION OF THE DYNAMIC TIME WARPING ALGORITHM FOR SPEECH RECOGNITION

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19753 ◽

2017 ◽

Vol 10 (13) ◽

pp. 248

Author(s):

John Sahaya Rani Alex ◽

Mitali Bhojwani

Keyword(s):

Speech Recognition ◽

Field Programmable Gate Array ◽

Dynamic Time Warping ◽

Recognition Algorithm ◽

Consumer Electronics ◽

Time Warping ◽

Time To Market ◽

Computing Power ◽

Field Programmable ◽

Dynamic Time

Objective of this research is to implement a speech recognition algorithm in smaller form factor device. Speech recognition is an extensively used inmobile and in numerous consumer electronics devices. Dynamic time warping (DTW) method which is based on dynamic programming is chosen tobe implemented for speech recognition because of the latest trend in evolving computing power. Implementation of DTW in field-programmable gatearray is chosen for its featured flexibility, parallelization and shorter time to market. The above algorithm is implemented using Verilog on Xilinx ISE.The warping cost is less if the similarity is found and is more for dissimilar sequences which is verified in the simulation output. The results indicatethat real time implementation of DTW based speech recognition could be done in future.

Download Full-text