Combining evidence from residual phase and MFCC features for speaker recognition

2006 ◽  
Vol 13 (1) ◽  
pp. 52-55 ◽  
Author(s):  
K.S.R. Murty ◽  
B. Yegnanarayana
2013 ◽  
Vol 22 (3) ◽  
pp. 241-251 ◽  
Author(s):  
B.G. Nagaraja ◽  
H.S. Jayanna

AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.


2020 ◽  
Vol 64 (4) ◽  
pp. 40404-1-40404-16
Author(s):  
I.-J. Ding ◽  
C.-M. Ruan

Abstract With rapid developments in techniques related to the internet of things, smart service applications such as voice-command-based speech recognition and smart care applications such as context-aware-based emotion recognition will gain much attention and potentially be a requirement in smart home or office environments. In such intelligence applications, identity recognition of the specific member in indoor spaces will be a crucial issue. In this study, a combined audio-visual identity recognition approach was developed. In this approach, visual information obtained from face detection was incorporated into acoustic Gaussian likelihood calculations for constructing speaker classification trees to significantly enhance the Gaussian mixture model (GMM)-based speaker recognition method. This study considered the privacy of the monitored person and reduced the degree of surveillance. Moreover, the popular Kinect sensor device containing a microphone array was adopted to obtain acoustic voice data from the person. The proposed audio-visual identity recognition approach deploys only two cameras in a specific indoor space for conveniently performing face detection and quickly determining the total number of people in the specific space. Such information pertaining to the number of people in the indoor space obtained using face detection was utilized to effectively regulate the accurate GMM speaker classification tree design. Two face-detection-regulated speaker classification tree schemes are presented for the GMM speaker recognition method in this study—the binary speaker classification tree (GMM-BT) and the non-binary speaker classification tree (GMM-NBT). The proposed GMM-BT and GMM-NBT methods achieve excellent identity recognition rates of 84.28% and 83%, respectively; both values are higher than the rate of the conventional GMM approach (80.5%). Moreover, as the extremely complex calculations of face recognition in general audio-visual speaker recognition tasks are not required, the proposed approach is rapid and efficient with only a slight increment of 0.051 s in the average recognition time.


Author(s):  
A. Nagesh

The feature vectors of speaker identification system plays a crucial role in the overall performance of the system. There are many new feature vectors extraction methods based on MFCC, but ultimately we want to maximize the performance of SID system.  The objective of this paper to derive Gammatone Frequency Cepstral Coefficients (GFCC) based a new set of feature vectors using Gaussian Mixer model (GMM) for speaker identification. The MFCC are the default feature vectors for speaker recognition, but they are not very robust at the presence of additive noise. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is  GFCC features based on GMM feature extraction is to improve the overall speaker identification performance in low signal to noise ratio (SNR) conditions.


1997 ◽  
Vol 12 (4) ◽  
pp. 225-229
Author(s):  
Cart-in A-S. Gustavsson ◽  
Chritofer T. Lindgren ◽  
Mikael E. Lindström

Abstract The amount of lignin reacting according to the slow residual phase, i.e. the residual phase lignin, is in many perspectives an interesting issue. The purpose of the present investigation was to develop a mathematical model to show how the amount of residual phase lignin in the kraft cooking of spruce chips (Picm ahies) depends on the conditions in the earlier phases of the cook. The variables studied were hydroxide ion concentration, hydrogen sulfide ion concentration and ionic strength. The liquor-to-wood ratio during pulping was very high to maintain approximately constant chemical concentrations throughout each experiment (so called "constant composition" cooks). An increase in hydroxide ion concentration andtor hydrogen sulfide ion concentration leads to a decrease in the amount of residual phase lignin, while an increase in ionic strength, i.e. sodium ion concentration, leads to an increase. A signiticant result is that the hydrogen sulfide ion concentration has a pronounced influence on the amount of residual phase lignin during a cook at a low hydroxide ion concentration. The amount of residual phase lignin expressed as % lignin on wood, L,, can be described by the following equation developed for "constant composition" cooks (when cooking with a constant sodium ion concentration of 2 mol/L): LT=0,55-0.32*[HO-](-1,3)*ln[HS-] This equation is valid for a concentration of HO- in the range from 0.17 to 1.4, and a hydrogen sulfide ion concentration from 0.07 to 0.6 mol/L.


Sign in / Sign up

Export Citation Format

Share Document