Improvement of Speaker Identification by Combining Prosodic Features with Acoustic Features

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.

Download Full-text

Pertinent Prosodic Features for Speaker Identification by Voice

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/jmcmc.2010040102 ◽

2010 ◽

Vol 2 (2) ◽

pp. 18-33

Author(s):

Halim Sayoud ◽

Siham Ouamour

Keyword(s):

Speaker Recognition ◽

Speaker Identification ◽

State Of The Art ◽

Speaker Verification ◽

Low Frequency ◽

Arabic Language ◽

Prosodic Features ◽

Acoustic Features ◽

Heterogeneous Features ◽

The Mean

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.

Download Full-text

Acoustic comparison of electronics disguised voice using Different semitones

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.16.11502 ◽

2018 ◽

Vol 7 (2.16) ◽

pp. 98 ◽

Cited By ~ 2

Author(s):

Mahesh K. Singh ◽

A K. Singh ◽

Narendra Singh

Keyword(s):

Support Vector Machine ◽

Acoustic Analysis ◽

Speaker Identification ◽

Support Vector ◽

Acoustic Features ◽

Acoustic Feature ◽

Mel Frequency Cepstral Coefficients ◽

Identification Rate ◽

Normal Voice ◽

Feature Based

This paper emphasizes an algorithm that is based on acoustic analysis of electronics disguised voice. Proposed work is given a comparative analysis of all acoustic feature and its statistical coefficients. Acoustic features are computed by Mel-frequency cepstral coefficients (MFCC) method and compare with a normal voice and disguised voice by different semitones. All acoustic features passed through the feature based classifier and detected the identification rate of all type of electronically disguised voice. There are two types of support vector machine (SVM) and decision tree (DT) classifiers are used for speaker identification in terms of classification efficiency of electronically disguised voice by different semitones.

Download Full-text

Simultaneous Identification and Localization of Still and Mobile Speakers Based on Binaural Robot Audition

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2017.p0059 ◽

2017 ◽

Vol 29 (1) ◽

pp. 59-71 ◽

Cited By ~ 2

Author(s):

Karim Youssef ◽

◽

Katsutoshi Itoyama ◽

Kazuyoshi Yoshii

Keyword(s):

Neural Networks ◽

Speaker Identification ◽

Position Estimation ◽

Acoustic Features ◽

Front End ◽

Robot Audition ◽

Speaker Tracking ◽

Common Signal ◽

Interaural Level Differences ◽

Simultaneous Identification

[abstFig src='/00290001/06.jpg' width='300' text='Efficient mobile speaker tracking' ] This paper jointly addresses the tasks of speaker identification and localization with binaural signals. The proposed system operates in noisy and echoic environments and involves limited computations. It demonstrates that a simultaneous identification and localization operation can benefit from a common signal processing front end for feature extraction. Moreover, a joint exploitation of the identity and position estimation outputs allows the outputs to limit each other’s errors. Equivalent rectangular bandwidth frequency cepstral coefficients (ERBFCC) and interaural level differences (ILD) are extracted. These acoustic features are respectively used for speaker identity and azimuth estimation through artificial neural networks (ANNs). The system was evaluated in simulated and real environments, with still and mobile speakers. Results demonstrate its ability to produce accurate estimations in the presence of noises and reflections. Moreover, the advantage of the binaural context over the monaural context for speaker identification is shown.

Download Full-text

Auditory and Acoustic Features from Clue-Words Sets for Forensic Speaker Identification and its Correlation with Probability Scales

Journal of Forensic Research ◽

10.4172/2157-7145.1000338 ◽

2016 ◽

Vol 7 (4) ◽

Author(s):

Babita Bhall ◽

Singh CP ◽

Rakesh Dhar ◽

Rajesh Soni

Keyword(s):

Speaker Identification ◽

Acoustic Features ◽

Forensic Speaker Identification

Download Full-text

Robust prosodic features for speaker identification

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 ◽

10.1109/icslp.1996.607979 ◽

2002 ◽

Cited By ~ 30

Author(s):

M.J. Carey ◽

E.S. Parris ◽

H. Lloyd-Thomas ◽

S. Bennett

Keyword(s):

Speaker Identification ◽

Prosodic Features

Download Full-text

Acoustic Identification of Sentence Accent in Speakers with Dysarthria: Cross-Population Validation and Severity Related Patterns

Brain Sciences ◽

10.3390/brainsci11101344 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1344

Author(s):

Viviana Mendoza Ramos ◽

Anja Lowit ◽

Leen Van den Steen ◽

Hector Arturo Kairuz Hernandez-Diaz ◽

Maria Esperanza Hernandez-Diaz Huici ◽

...

Keyword(s):

Discriminant Analysis ◽

English Speakers ◽

Prosodic Features ◽

Acoustic Features ◽

Healthy Control ◽

Severity Levels ◽

Acoustic Identification ◽

Automated Procedures ◽

The Impact

Dysprosody is a hallmark of dysarthria, which can affect the intelligibility and naturalness of speech. This includes sentence accent, which helps to draw listeners’ attention to important information in the message. Although some studies have investigated this feature, we currently lack properly validated automated procedures that can distinguish between subtle performance differences observed across speakers with dysarthria. This study aims for cross-population validation of a set of acoustic features that have previously been shown to correlate with sentence accent. In addition, the impact of dysarthria severity levels on sentence accent production is investigated. Two groups of adults were analysed (Dutch and English speakers). Fifty-eight participants with dysarthria and 30 healthy control participants (HCP) produced sentences with varying accent positions. All speech samples were evaluated perceptually and analysed acoustically with an algorithm that extracts ten meaningful prosodic features and allows a classification between accented and unaccented syllables based on a linear combination of these parameters. The data were statistically analysed using discriminant analysis. Within the Dutch and English dysarthric population, the algorithm correctly identified 82.8 and 91.9% of the accented target syllables, respectively, indicating that the capacity to discriminate between accented and unaccented syllables in a sentence is consistent with perceptual impressions. Moreover, different strategies for accent production across dysarthria severity levels could be demonstrated, which is an important step toward a better understanding of the nature of the deficit and the automatic classification of dysarthria severity using prosodic features.

Download Full-text

Conventional Acoustic Features Based Gammachirp Filterbank for Text Independent Speaker Identification System in Noisy Environments

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v10i3.5235 ◽

2015 ◽

Vol 10 (3) ◽

pp. 271

Author(s):

Amina Ben Abdallah ◽

Zied Hajaiej ◽

N. Ellouze

Keyword(s):

Speaker Identification ◽

Identification System ◽

Acoustic Features ◽

Noisy Environments

Download Full-text

Improvement of speaker recognition by combining residual and prosodic features with acoustic features

2004 IEEE International Conference on Acoustics, Speech, and Signal Processing ◽

10.1109/icassp.2004.1325930 ◽

2004 ◽

Cited By ~ 1

Author(s):

Shi-Han Chen ◽

Hsiao-Chuan Wang

Keyword(s):

Speaker Recognition ◽

Prosodic Features ◽

Acoustic Features

Download Full-text

A Two-Level Speaker Identification System via Fusion of Heterogeneous Classifiers and Complementary Feature Cooperation

Sensors ◽

10.3390/s21155097 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5097

Author(s):

Mohammad Al-Qaderi ◽

Elfituri Lahamer ◽

Ahmad Rad

Keyword(s):

Speaker Identification ◽

Signal To Noise Ratio ◽

Support Vector ◽

Spectral Features ◽

Prosodic Features ◽

Signal To Noise ◽

Short Term ◽

Classifier System ◽

Cepstral Coefficients ◽

Noise Ratio

We present a new architecture to address the challenges of speaker identification that arise in interaction of humans with social robots. Though deep learning systems have led to impressive performance in many speech applications, limited speech data at training stage and short utterances with background noise at test stage present challenges and are still open problems as no optimum solution has been reported to date. The proposed design employs a generative model namely the Gaussian mixture model (GMM) and a discriminative model—support vector machine (SVM) classifiers as well as prosodic features and short-term spectral features to concurrently classify a speaker’s gender and his/her identity. The proposed architecture works in a semi-sequential manner consisting of two stages: the first classifier exploits the prosodic features to determine the speaker’s gender which in turn is used with the short-term spectral features as inputs to the second classifier system in order to identify the speaker. The second classifier system employs two types of short-term spectral features; namely mel-frequency cepstral coefficients (MFCC) and gammatone frequency cepstral coefficients (GFCC) as well as gender information as inputs to two different classifiers (GMM and GMM supervector-based SVM) which in total leads to construction of four classifiers. The outputs from the second stage classifiers; namely GMM-MFCC maximum likelihood classifier (MLC), GMM-GFCC MLC, GMM-MFCC supervector SVM, and GMM-GFCC supervector SVM are fused at score level by the weighted Borda count approach. The weight factors are computed on the fly via Mamdani fuzzy inference system that its inputs are the signal to noise ratio and the length of utterance. Experimental evaluations suggest that the proposed architecture and the fusion framework are promising and can improve the recognition performance of the system in challenging environments where the signal-to-noise ratio is low, and the length of utterance is short; such scenarios often arise in social robot interactions with humans.

Download Full-text