Speaker-independent HMM-based voice conversion using quantized fundamental frequency

Author(s):  
Takashi Nose ◽  
Takao Kobayashi
2010 ◽  
Vol 61 (1) ◽  
pp. 57-61 ◽  
Author(s):  
Allam Mousa

Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-SamplingVoice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Jagannath Nirmal ◽  
Suprava Patnaik ◽  
Mukesh Zaveri ◽  
Pramod Kachare

The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency (F0) are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (F0). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.


2018 ◽  
Vol 33 (3) ◽  
pp. 229-246
Author(s):  
Đào Thị Lệ Thủy ◽  
Trinh Van Loan ◽  
Nguyen Hong Quang

This paper presents the results of GMM-based recognition for four basic emotions of Vietnamese such as neutral, sadness, anger and happiness. The characteristic parameters of these emotions are extracted from speech signals and divided into different parameter sets for experiments. The experiments are carried out according to speaker-dependent or speaker-independent and content-dependent or content-independent recognitions. The results showed that the recognition scores are rather high with the case for which there is a full combination of parameters as MFCC and its first and second derivatives, fundamental frequency, energy, formants and its correspondent bandwidths, spectral characteristics and F0 variants. In average, the speaker-dependent and content-dependent recognition scrore is 89.21%. Next, the average score is 82.27% for the speaker-dependent and content-independent recognition. For the speaker-independent and content-dependent recognition, the average score is 70.35%. The average score is 66.99% for speaker-independent and content-independent recognition. Information on F0 has significantly increased the score of recognition


Sign in / Sign up

Export Citation Format

Share Document