scholarly journals Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling

2010 ◽  
Vol 61 (1) ◽  
pp. 57-61 ◽  
Author(s):  
Allam Mousa

Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-SamplingVoice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.

2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Jagannath Nirmal ◽  
Suprava Patnaik ◽  
Mukesh Zaveri ◽  
Pramod Kachare

The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency (F0) are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (F0). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.


1979 ◽  
Vol 10 (4) ◽  
pp. 246-248 ◽  
Author(s):  
Peter B. Mueller ◽  
Marla Adams ◽  
Jean Baehr-Rouse ◽  
Debbie Boos

Mean fundamental frequencies of male and female subjects obtained with FLORIDA I and a tape striation counting procedure were compared. The fundamental frequencies obtained with these two methods were similar and it appears that the tape striation counting procedure is a viable, simple, and inexpensive alternative to more costly and complicated procedures and instrumentation.


2007 ◽  
Vol 2007 ◽  
pp. 1-5 ◽  
Author(s):  
Aïcha Bouzid ◽  
Noureddine Ellouze

This paper describes a multiscale product method (MPM) for open quotient measure in voiced speech. The method is based on determining the glottal closing and opening instants. The proposed approach consists of making the products of wavelet transform of speech signal at different scales in order to enhance the edge detection and parameter estimation. We show that the proposed method is effective and robust for detecting speech singularity. Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important in a wide range of speech processing tasks. In this paper, accurate estimation of GCIs and GOIs is used to measure the local open quotient (Oq) which is the ratio of the open time by the pitch period. Multiscale product operates automatically on speech signal; the reference electroglottogram (EGG) signal is used for performance evaluation. The ratio of good GCI detection is 95.5% and that of GOI is 76%. The pitch period relative error is 2.6% and the open phase relative error is 5.6%. The relative error measured on open quotient reaches 3% for the whole Keele database.


Author(s):  
Steven E. Stern ◽  
John W. Mullennix ◽  
Olivier Corneille ◽  
Johanne Huart

Abstract. Corneille, Huart, Becquart, & Brédart (2004) found that people remember ambiguous race faces as closer to a race prototype than they actually are. In three studies, we examined whether this memory bias generalizes to voice memory. In Studies 1 and 2, participants listened to synthesized male and female speech samples (high, moderate, or low pitch) and were asked to identify a voice target when paired against distracters higher or lower in pitch. The results showed that pitch distortions occurred, with the pattern consistent with assimilation toward low and high ends of the pitch continuum. Study 3 replicated this result with a wider voice pitch range. The results parallel those of Corneille et al. (2004) . The implications of this work are discussed.


2014 ◽  
Vol 596 ◽  
pp. 433-436 ◽  
Author(s):  
Yao Qi Wang ◽  
Xiao Peng Wang ◽  
Lv Cheng Wang

A new method of pitch detection based on morphological filtering is proposed. Noisy speech signal is filtered by morphological filtering to remove the noise and highlight pitch, and then HHT is employed to get Hilbert-Huang spectrum and to calculate instantaneous energy and its derivative. The moment of glottal opening and closing can be accurately located through tracking mutation of instantaneous energy, so that variation of pitch period can be accurately tracked. Compared with other traditional method of pitch detection, this method not only truly describes non-stationary and non-linear characteristics of speech signal, but also it is an adaptive process for the analysis of the speech signal. The experiments showed that the method has strong anti-noise and can accurately detect the pitch of speech in low SNR.


1982 ◽  
Vol 25 (4) ◽  
pp. 628-630
Author(s):  
Richard Troughear

The errors involved in the use of an analog pitch period detector and a microcomputer to measure jitter and shimmer were explored. A simulation study using sinusoidal waveforms was conducted to ascertain the nature of temporal errors occurring in sampled signal frequency perturbation studies. The results indicate that even for jitter free signals errors can occur in frequency perturbation measurements, that the magnitude of these errors can be as high as actual frequency perturbations occurring in steady human vowels, and that the magnitude of the errors is not a function of fundamental frequency hut of the remainder of the ratio of signal period to sample period.


Sign in / Sign up

Export Citation Format

Share Document