Speaker recognition based on dynamic time warping and Gaussian mixture model

Robot learning from demonstration is a method which enables robots to learn in a similar way as humans. In this paper, a framework that enables robots to learn from multiple human demonstrations via kinesthetic teaching is presented. The subject of learning is a high-level sequence of actions, as well as the low-level trajectories necessary to be followed by the robot to perform the object manipulation task. The multiple human demonstrations are recorded and only the most similar demonstrations are selected for robot learning. The high-level learning module identifies the sequence of actions of the demonstrated task. Using Dynamic Time Warping (DTW) and Gaussian Mixture Model (GMM), the model of demonstrated trajectories is learned. The learned trajectory is generated by Gaussian mixture regression (GMR) from the learned Gaussian mixture model. In online working phase, the sequence of actions is identified and experimental results show that the robot performs the learned task successfully.

Download Full-text

Duration weighted Gaussian Mixture Model supervector modeling for robust speaker recognition

2013 Ninth International Conference on Natural Computation (ICNC) ◽

10.1109/icnc.2013.6817977 ◽

2013 ◽

Author(s):

Zhe Ji ◽

Wei Hou ◽

Xin Jin ◽

Zhi-Yi Li

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Gaussian Mixture ◽

Robust Speaker Recognition

Download Full-text

A GAUSSIAN MIXTURE MODEL-BASED SPEAKER RECOGNITION SYSTEM

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19596 ◽

2017 ◽

Vol 10 (13) ◽

pp. 140

Author(s):

Kumari Piu Gorai ◽

Thomas Abraham

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Gaussian Mixture ◽

Recognition System ◽

Human Being ◽

Biometric System ◽

Voice Signal ◽

Signal Characteristics ◽

Authentication Technique

A human being has lot of unique features and one of them is voice. Speaker recognition is the use of a system to distinguish and identify a person from his/her vocal sound. A speaker recognition system (SRS) can be used as one of the authentication technique, in addition to the conventional authentication methods. This paper represents the overview of voice signal characteristics and speaker recognition techniques. It also discusses the advantages and problem of current SRS. The only biometric system that allows users to authenticate remotely is voice-based SRS, we are in the need of a robust SRS.

Download Full-text

Joint map adaptation of feature transformation and Gaussian Mixture Model for speaker recognition

2009 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2009.4960516 ◽

2009 ◽

Cited By ~ 6

Author(s):

Donglai Zhu ◽

Bin Ma ◽

Haizhou Li

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Gaussian Mixture ◽

Feature Transformation ◽

Map Adaptation

Download Full-text

Analisis Speaker Recognition Menggunakan Metode Dynamic Time Warping (DTW) Berbasis Matlab

AVITEC ◽

10.28989/avitec.v1i1.492 ◽

2019 ◽

Vol 1 (1) ◽

Author(s):

Noor Fita Indri Prayoga

Keyword(s):

Feature Extraction ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Recognition Accuracy ◽

Recognition System ◽

Extraction Process ◽

Test Results ◽

Time Warping ◽

Dynamic Time ◽

The Voice

Voice is one of way to communicate and express yourself. Speaker recognition is a process carried out by a device to recognize the speaker through the voice. This study designed a speaker recognition system that was able to identify speakers based on what was said by using dynamic time warping (DTW) method based in matlab. To design a speaker recognition system begins with the process of reference data and test data. Both processes have the same process, which starts with sound recording, preprocessing, and feature extraction. In this system, the Fast Fourier Transform (FFT) method is used to extract the features. The results of the feature extraction process from the two data will be compared using the DTW method. Calculations using DTW that produce the smallest value will be determined as the output. The test results show that the system can identify the voice with the best level of recognition accuracy of 90%, and the average recognition accuracy of 80%. The results were obtained from 50 tests, carried out by 5 people consisting of 3 men and 2 women, each speaker said a predetermined word

Download Full-text

Addressing Text-Dependent Speaker Verification Using Singing Speech

Applied Sciences ◽

10.3390/app9132636 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2636 ◽

Cited By ~ 2

Author(s):

Yan Shi ◽

Juanjuan Zhou ◽

Yanhua Long ◽

Yijie Li ◽

Hongwei Mao

Keyword(s):

Dynamic Time Warping ◽

State Of The Art ◽

Speaker Verification ◽

Gaussian Mixture ◽

Time Warping ◽

Normal Reading ◽

Feature Spaces ◽

Dynamic Time ◽

The One ◽

Normal Speech

The automatic speaker verification (ASV) has achieved significant progress in recent years. However, it is still very challenging to generalize the ASV technologies to new, unknown and spoofing conditions. Most previous studies focused on extracting the speaker information from natural speech. This paper attempts to address the speaker verification from another perspective. The speaker identity information was exploited from singing speech. We first designed and released a new corpus for speaker verification based on singing and normal reading speech. Then, the speaker discrimination was compared and analyzed between natural and singing speech in different feature spaces. Furthermore, the conventional Gaussian mixture model, the dynamic time warping and the state-of-the-art deep neural network were investigated. They were used to build text-dependent ASV systems with different training-test conditions. Experimental results show that the voiceprint information in the singing speech was more distinguishable than the one in the normal speech. More than relative 20% reduction of equal error rate was obtained on both the gender-dependent and independent 1 s-1 s evaluation tasks.

Download Full-text

Building Sequence Kernels for Speaker Verification and Word Recognition

Intelligent Information Technologies ◽

10.4018/978-1-59904-941-0.ch033 ◽

2011 ◽

pp. 575-589

Author(s):

Vincent Wan

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Kernel Methods ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Speaker Verification ◽

Dimensional Space ◽

Time Warping ◽

Recognition Systems ◽

Dynamic Time

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.

Download Full-text

Building Sequence Kernels for Speaker Verification and Word Recognition

Kernel Methods in Bioengineering, Signal and Image Processing ◽

10.4018/978-1-59904-042-4.ch010 ◽

2011 ◽

pp. 246-262

Author(s):

Vincent Wan

Keyword(s):

Speech Recognition ◽

Speech Processing ◽

Kernel Methods ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Speaker Verification ◽

Dimensional Space ◽

Time Warping ◽

Recognition Systems ◽

Dynamic Time

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.

Download Full-text

Gaussian Mixture Model Based Classification of Stuttering Dysfluencies

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0140 ◽

2016 ◽

Vol 25 (3) ◽

pp. 387-399

Author(s):

P. Mahesha ◽

D.S. Vinod

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Gaussian Mixture ◽

Modeling Technique ◽

Mel Frequency Cepstral Coefficients ◽

Automatic Speaker Recognition ◽

Word Repetition ◽

Syllable Repetition

AbstractThe classification of dysfluencies is one of the important steps in objective measurement of stuttering disorder. In this work, the focus is on investigating the applicability of automatic speaker recognition (ASR) method for stuttering dysfluency recognition. The system designed for this particular task relies on the Gaussian mixture model (GMM), which is the most widely used probabilistic modeling technique in ASR. The GMM parameters are estimated from Mel frequency cepstral coefficients (MFCCs). This statistical speaker-modeling technique represents the fundamental characteristic sounds of speech signal. Using this model, we build a dysfluency recognizer that is capable of recognizing dysfluencies irrespective of a person as well as what is being said. The performance of the system is evaluated for different types of dysfluencies such as syllable repetition, word repetition, prolongation, and interjection using speech samples from the University College London Archive of Stuttered Speech (UCLASS).

Download Full-text