Combining evidence from residual phase and MFCC features for speaker recognition

AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.

Download Full-text

Machine Learning for Speaker Recognition

10.1017/9781108552332 ◽

2020 ◽

Cited By ~ 2

Author(s):

Man-Wai Mak ◽

Jen-Tzung Chien

Keyword(s):

Machine Learning ◽

Speaker Recognition

Download Full-text

Integrated approach to speaker recognition in forensic applications

International Journal of Speech Language and the Law ◽

10.1558/ijsll.v3i1.50 ◽

2013 ◽

Vol 3 (1) ◽

pp. 50-64

Author(s):

Wojciech Majewski ◽

Czeslaw Basztura

Keyword(s):

Speaker Recognition ◽

Integrated Approach

Download Full-text

TEXT-INDEPENDENT SPEAKER RECOGNITION USING COMBINED LPC AND MFC COEFFICIENTS

International Journal of Research in Engineering and Technology ◽

10.15623/ijret.2014.0306095 ◽

2014 ◽

Vol 03 (06) ◽

pp. 508-514

Author(s):

PPS Subhashini .

Keyword(s):

Speaker Recognition

Download Full-text

Speaker Identity Recognition by Acoustic and Visual Data Fusion through Personal Privacy for Smart Care and Service Applications

Journal of Imaging Science and Technology ◽

10.2352/j.imagingsci.technol.2020.64.4.040404 ◽

2020 ◽

Vol 64 (4) ◽

pp. 40404-1-40404-16

Author(s):

I.-J. Ding ◽

C.-M. Ruan

Keyword(s):

Face Detection ◽

Speaker Recognition ◽

Visual Information ◽

Classification Tree ◽

Gaussian Mixture ◽

Recognition Method ◽

Indoor Space ◽

Identity Recognition ◽

Visual Identity ◽

Speaker Classification

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.

Download Full-text

Speaker recognition based on dynamic time warping and Gaussian mixture model

2020 39th Chinese Control Conference (CCC) ◽

10.23919/ccc50068.2020.9188632 ◽

2020 ◽

Author(s):

Nannan Zhang ◽

Yanru Yao

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Speaker Recognition ◽

Dynamic Time Warping ◽

Gaussian Mixture ◽

Time Warping ◽

Dynamic Time

Download Full-text

New Feature Vectors using GFCC for Speaker Identification

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i8.146 ◽

2018 ◽

Vol 6 (8) ◽

pp. 243

Author(s):

A. Nagesh

Keyword(s):

Speaker Recognition ◽

Speaker Identification ◽

Signal To Noise Ratio ◽

Main Idea ◽

Extraction Methods ◽

Identification System ◽

Identification Performance ◽

Feature Vectors ◽

Overall Performance ◽

New Feature

The feature vectors of speaker identification system plays a crucial role in the overall performance of the system. There are many new feature vectors extraction methods based on MFCC, but ultimately we want to maximize the performance of SID system. The objective of this paper to derive Gammatone Frequency Cepstral Coefficients (GFCC) based a new set of feature vectors using Gaussian Mixer model (GMM) for speaker identification. The MFCC are the default feature vectors for speaker recognition, but they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is GFCC features based on GMM feature extraction is to improve the overall speaker identification performance in low signal to noise ratio (SNR) conditions.

Download Full-text

Residual phase lignin in haft cooking related to the conditions in the cook

Nordic Pulp & Paper Research Journal ◽

10.3183/npprj-1997-12-04-p225-23 ◽

1997 ◽

Vol 12 (4) ◽

pp. 225-229

Author(s):

Cart-in A-S. Gustavsson ◽

Chritofer T. Lindgren ◽

Mikael E. Lindström

Keyword(s):

Hydrogen Sulfide ◽

Ionic Strength ◽

Sodium Ion ◽

Ion Concentration ◽

Constant Composition ◽

Hydroxide Ion ◽

Sulfide Ion ◽

Residual Phase ◽

Kraft Cooking ◽

Very High

Abstract The amount of lignin reacting according to the slow residual phase, i.e. the residual phase lignin, is in many perspectives an interesting issue. The purpose of the present investigation was to develop a mathematical model to show how the amount of residual phase lignin in the kraft cooking of spruce chips (Picm ahies) depends on the conditions in the earlier phases of the cook. The variables studied were hydroxide ion concentration, hydrogen sulfide ion concentration and ionic strength. The liquor-to-wood ratio during pulping was very high to maintain approximately constant chemical concentrations throughout each experiment (so called "constant composition" cooks). An increase in hydroxide ion concentration andtor hydrogen sulfide ion concentration leads to a decrease in the amount of residual phase lignin, while an increase in ionic strength, i.e. sodium ion concentration, leads to an increase. A signiticant result is that the hydrogen sulfide ion concentration has a pronounced influence on the amount of residual phase lignin during a cook at a low hydroxide ion concentration. The amount of residual phase lignin expressed as % lignin on wood, L,, can be described by the following equation developed for "constant composition" cooks (when cooking with a constant sodium ion concentration of 2 mol/L): LT=0,55-0.32*[HO-](-1,3)*ln[HS-] This equation is valid for a concentration of HO- in the range from 0.17 to 1.4, and a hydrogen sulfide ion concentration from 0.07 to 0.6 mol/L.

Download Full-text