scholarly journals On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification

Author(s):  
Wajdi Ghezaiel ◽  
Amel Ben Slimane ◽  
Ezzedine Ben Braiek

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>

Author(s):  
Wajdi Ghezaiel ◽  
Amel Ben Slimane ◽  
Ezzedine Ben Braiek

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>


2020 ◽  
Vol 38 (5A) ◽  
pp. 769-778
Author(s):  
Rawia A. Mohammed ◽  
Nidaa F. Hassan ◽  
Akbas E. Ali

The performance regarding the Speaker Identification Systems (SIS) has enhanced because of the current developments in speech processing methods, however, an improvement is still required with regard to text-independent speaker identification in the Arabic language. In spite of tremendous progress in applied technology for SIS, it is limited to English and some other languages. This paper aims to design an efficient SIS (text-independent) for the Arabic language. The proposed system uses speech signal features for speaker identification purposes, and it includes two phases: The first phase is training, in this phase a corpus of reference database is built which will serve as a reference for comparing and identifying the speaker for the second phase. The second phase is testing, which searches the identification of the speaker. In this system, the features will be extracted according to: Mel Frequency Cepstrum Coefficient (MFCC), mathematical calculations of voice frequency and voice fundamental frequency. Machine learning classification techniques: K-nearest neighbors, Sequential Minimum Optimization and Logistic Model Tree are used in the classification process. The best classification technique is a K-nearest neighbors, where it gives higher precision 94.8%.


Author(s):  
Abrham Debasu Mengistu ◽  
Dagnachew Melesew Alemayehu

<p>In Ethiopia, the largest ethnic and linguistic groups are the Oromos, Amharas and Tigrayans. This paper presents the performance analysis of text-independent speaker identification system for the Amharic language in noisy environments. VQ (Vector Quantization), GMM (Gaussian Mixture Models), BPNN (Back propagation neural network), MFCC (Mel-frequency cepstrum coefficients), GFCC (Gammatone Frequency Cepstral Coefficients), and a hybrid approach had been use as techniques for identifying speakers of Amharic language in noisy environments. For the identification process, speech signals are collected from different speakers including both sexes; for our data set, a total of 90 speakers’ speech samples were collected, and each speech have 10 seconds duration from each individual. From these speakers, 59.2%, 70.9% and 84.7% accuracy are achieved when VQ, GMM and BPNN are used on the combined feature vector of MFCC and GFCC. </p>


2021 ◽  
Vol 38 (6) ◽  
pp. 1793-1799
Author(s):  
Shivaprasad Satla ◽  
Sadanandam Manchala

Dialect Identification is the process of identifies the dialects of particular standard language. The Telugu Language is one of the historical and important languages. Like any other language Telugu also contains mainly three dialects Telangana, Costa Andhra and Rayalaseema. The research work in dialect identification is very less compare to Language identification because of dearth of database. In any dialects identification system, the database and feature engineering play vital roles because of most the words are similar in pronunciation and also most of the researchers apply statistical approaches like Hidden Markov Model (HMM), Gaussian Mixture Model (GMM), etc. to work on speech processing applications. But in today's world, neural networks play a vital role in all application domains and produce good results. One of the types of the neural networks is Deep Neural Networks (DNN) and it is used to achieve the state of the art performance in several fields such as speech recognition, speaker identification. In this, the Deep Neural Network (DNN) based model Multilayer Perceptron is used to identify the regional dialects of the Telugu Language using enhanced Mel Frequency Cepstral Coefficients (MFCC) features. To do this, created a database of the Telugu dialects with the duration of 5h and 45m collected from different speakers in different environments. The results produced by DNN model compared with HMM and GMM model and it is observed that the DNN model provides good performance.


2021 ◽  
Author(s):  
Chander Prabha ◽  
Sukhvinder Kaur ◽  
Meenu Gupta ◽  
Fadi Al-Turjman

Abstract An important application of speech processing is speaker recognition, which automatically recognizes the person speaking in an audio recording, basis of which is speaker-specific information included in its speech features. It involves speaker verification and speaker identification. This paper presents an efficient method based on discrete wavelet transform and optimized variance spectral flux to enhance the enactment of speaker identification system. An effective feature extraction technique uses Daubechies 40 (db40) wavelet to compress and de-noised the speech signal by its decomposition into approximations and details coefficients at level 1. The approximation coefficients contain 99.9% of speech information as compared to detailed coefficients. So, the optimized variance spectral flux is applied on wavelet approximation coefficients which efficiently extract the frequency contents of the speech signal and gives unique features. The distance between extracted features has been obtained by applying traditional Bayesian information criteria. Experimental results were computed on recording data of 33 speakers (23 female and 10 males) for text independent identification of speaker. Evaluation of effectiveness of the proposed system is done by applying detection error trade-off curves, receiver operating characteristic, and area under curve. It shows 94.38% of speaker identification results when compared with traditional method using Mel frequency spectral coefficients which is 90.70%.


2020 ◽  
pp. 65-72
Author(s):  
V. V. Savchenko ◽  
A. V. Savchenko

This paper is devoted to the presence of distortions in a speech signal transmitted over a communication channel to a biometric system during voice-based remote identification. We propose to preliminary correct the frequency spectrum of the received signal based on the pre-distortion principle. Taking into account a priori uncertainty, a new information indicator of speech signal distortions and a method for measuring it in conditions of small samples of observations are proposed. An example of fast practical implementation of the method based on a parametric spectral analysis algorithm is considered. Experimental results of our approach are provided for three different versions of communication channel. It is shown that the usage of the proposed method makes it possible to transform the initially distorted speech signal into compliance on the registered voice template by using acceptable information discrimination criterion. It is demonstrated that our approach may be used in existing biometric systems and technologies of speaker identification.


Sign in / Sign up

Export Citation Format

Share Document