On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>

Download Full-text

Improving speaker identification system using discrete wavelet transform and AWGN

2014 IEEE 5th International Conference on Software Engineering and Service Science ◽

10.1109/icsess.2014.6933775 ◽

2014 ◽

Cited By ~ 3

Author(s):

Heba Maged ◽

Ahmed AbouEl-Farag ◽

Saleh Mesbah

Keyword(s):

Wavelet Transform ◽

Discrete Wavelet Transform ◽

Speaker Identification ◽

Identification System ◽

Discrete Wavelet

Download Full-text

MULTILINGUAL SPEECH PROCESSING THROUGH MFCCS FEATURE EXTRACTION FOR MULTILINGUAL SPEAKER IDENTIFICATION SYSTEM

i-manager’s Journal on Pattern Recognition ◽

10.26634/jpr.3.1.8102 ◽

2016 ◽

Vol 3 (1) ◽

pp. 1

Author(s):

JAIN VINAY KUMAR ◽

TRIPATHI NEETA ◽

◽

Keyword(s):

Feature Extraction ◽

Speech Processing ◽

Speaker Identification ◽

Identification System ◽

Multilingual Speech Processing

Download Full-text

Arabic Speaker Identification System Using Multi Features

Engineering and Technology Journal ◽

10.30684/etj.v38i5a.408 ◽

2020 ◽

Vol 38 (5A) ◽

pp. 769-778

Author(s):

Rawia A. Mohammed ◽

Nidaa F. Hassan ◽

Akbas E. Ali

Keyword(s):

Speech Processing ◽

Speaker Identification ◽

Nearest Neighbors ◽

Arabic Language ◽

Reference Database ◽

Identification System ◽

Second Phase ◽

K Nearest Neighbors ◽

Machine Learning Classification ◽

Signal Features

The performance regarding the Speaker Identification Systems (SIS) has enhanced because of the current developments in speech processing methods, however, an improvement is still required with regard to text-independent speaker identification in the Arabic language. In spite of tremendous progress in applied technology for SIS, it is limited to English and some other languages. This paper aims to design an efficient SIS (text-independent) for the Arabic language. The proposed system uses speech signal features for speaker identification purposes, and it includes two phases: The first phase is training, in this phase a corpus of reference database is built which will serve as a reference for comparing and identifying the speaker for the second phase. The second phase is testing, which searches the identification of the speaker. In this system, the features will be extracted according to: Mel Frequency Cepstrum Coefficient (MFCC), mathematical calculations of voice frequency and voice fundamental frequency. Machine learning classification techniques: K-nearest neighbors, Sequential Minimum Optimization and Logistic Model Tree are used in the classification process. The best classification technique is a K-nearest neighbors, where it gives higher precision 94.8%.

Download Full-text

Linear Versus Nonlinear Multi-scale Decomposition for Co-channel Speaker Identification System

Recent Advances in Nonlinear Speech Processing - Smart Innovation, Systems and Technologies ◽

10.1007/978-3-319-28109-4_17 ◽

2016 ◽

pp. 169-177

Author(s):

Wajdi Ghezaiel ◽

Amel Ben Slimane ◽

Ezzedine Ben Braiek

Keyword(s):

Speaker Identification ◽

Identification System ◽

Multi Scale

Download Full-text

Text Independent Amharic Language Speaker Identification in Noisy Environments using Speech Processing Techniques

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v5.i1.pp109-114 ◽

2017 ◽

Vol 5 (1) ◽

pp. 109 ◽

Cited By ~ 1

Author(s):

Abrham Debasu Mengistu ◽

Dagnachew Melesew Alemayehu

Keyword(s):

Speech Processing ◽

Speaker Identification ◽

Hybrid Approach ◽

Gaussian Mixture Models ◽

Back Propagation ◽

Gaussian Mixture ◽

Back Propagation Neural Network ◽

Identification System ◽

Noisy Environments ◽

Data Set

<p>In Ethiopia, the largest ethnic and linguistic groups are the Oromos, Amharas and Tigrayans. This paper presents the performance analysis of text-independent speaker identification system for the Amharic language in noisy environments. VQ (Vector Quantization), GMM (Gaussian Mixture Models), BPNN (Back propagation neural network), MFCC (Mel-frequency cepstrum coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), and a hybrid approach had been use as techniques for identifying speakers of Amharic language in noisy environments. For the identification process, speech signals are collected from different speakers including both sexes; for our data set, a total of 90 speakers’ speech samples were collected, and each speech have 10 seconds duration from each individual. From these speakers, 59.2%, 70.9% and 84.7% accuracy are achieved when VQ, GMM and BPNN are used on the combined feature vector of MFCC and GFCC. </p>

Download Full-text

Evaluation of Speaker Identification System using GSMEFR speech Data

5th International Conference on Design & Technology of Integrated Systems in Nanoscale Era ◽

10.1109/dtis.2010.5487589 ◽

2010 ◽

Cited By ~ 12

Author(s):

Ahmed Krobba ◽

Mohamed Debyeche ◽

Abederrahmane Amrouche

Keyword(s):

Speaker Identification ◽

Identification System ◽

Speech Data

Download Full-text

Dialect Identification in Telugu Language Speech Utterance Using Modified Features with Deep Neural Network

Traitement du signal ◽

10.18280/ts.380623 ◽

2021 ◽

Vol 38 (6) ◽

pp. 1793-1799

Author(s):

Shivaprasad Satla ◽

Sadanandam Manchala

Keyword(s):

Neural Network ◽

Neural Networks ◽

Speech Processing ◽

Deep Neural Network ◽

Speaker Identification ◽

Research Work ◽

Gaussian Mixture ◽

Vital Role ◽

Identification System ◽

Mel Frequency Cepstral Coefficients

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.

Download Full-text

Optimised Features for Speaker Identification using Daubechies Wavelet based Variance Spectral Flux

10.21203/rs.3.rs-178374/v1 ◽

2021 ◽

Author(s):

Chander Prabha ◽

Sukhvinder Kaur ◽

Meenu Gupta ◽

Fadi Al-Turjman

Keyword(s):

Speaker Recognition ◽

Speech Signal ◽

Speaker Identification ◽

Speaker Verification ◽

Information Criteria ◽

Identification System ◽

Discrete Wavelet ◽

Specific Information ◽

Spectral Flux ◽

Approximation Coefficients

Abstract An important application of speech processing is speaker recognition, which automatically recognizes the person speaking in an audio recording, basis of which is speaker-specific information included in its speech features. It involves speaker verification and speaker identification. This paper presents an efficient method based on discrete wavelet transform and optimized variance spectral flux to enhance the enactment of speaker identification system. An effective feature extraction technique uses Daubechies 40 (db40) wavelet to compress and de-noised the speech signal by its decomposition into approximations and details coefficients at level 1. The approximation coefficients contain 99.9% of speech information as compared to detailed coefficients. So, the optimized variance spectral flux is applied on wavelet approximation coefficients which efficiently extract the frequency contents of the speech signal and gives unique features. The distance between extracted features has been obtained by applying traditional Bayesian information criteria. Experimental results were computed on recording data of 33 speakers (23 female and 10 males) for text independent identification of speaker. Evaluation of effectiveness of the proposed system is done by applying detection error trade-off curves, receiver operating characteristic, and area under curve. It shows 94.38% of speaker identification results when compared with traditional method using Mel frequency spectral coefficients which is 90.70%.

Download Full-text

Method for measuring distortions of a speech signal during its transmission over a communication channel to a biometric identification system

Izmeritel`naya Tekhnika ◽

10.32446/0368-1025it.2020-11-65-72 ◽

2020 ◽

pp. 65-72

Author(s):

V. V. Savchenko ◽

A. V. Savchenko

Keyword(s):

Speech Signal ◽

Communication Channel ◽

Speaker Identification ◽

A Priori ◽

Small Samples ◽

Practical Implementation ◽

Identification System ◽

New Information ◽

Discrimination Criterion ◽

A Priori Uncertainty

This paper is devoted to the presence of distortions in a speech signal transmitted over a communication channel to a biometric system during voice-based remote identification. We propose to preliminary correct the frequency spectrum of the received signal based on the pre-distortion principle. Taking into account a priori uncertainty, a new information indicator of speech signal distortions and a method for measuring it in conditions of small samples of observations are proposed. An example of fast practical implementation of the method based on a parametric spectral analysis algorithm is considered. Experimental results of our approach are provided for three different versions of communication channel. It is shown that the usage of the proposed method makes it possible to transform the initially distorted speech signal into compliance on the registered voice template by using acceptable information discrimination criterion. It is demonstrated that our approach may be used in existing biometric systems and technologies of speaker identification.

Download Full-text