Increasing the Intelligibility and Naturalness of Alaryngeal Speech Using Voice Conversion and Synthetic Fundamental Frequency

Author(s):  
Tuan Dinh ◽  
Alexander Kain ◽  
Robin Samlan ◽  
Beiming Cao ◽  
Jun Wang
2002 ◽  
Vol 45 (6) ◽  
pp. 1106-1118 ◽  
Author(s):  
M. A. van Rossum ◽  
G. de Krom ◽  
S. G. Nooteboom ◽  
H. Quené

Highly proficient alaryngeal speakers are known to convey prosody successfully. The present study investigated whether alaryngeal speakers not selected on grounds of proficiency were able to convey pitch accent (a pitch accent is realized on the word that is in focus, cf. Bolinger, 1958). The participating speakers (10 tracheoesophageal, 9 esophageal, and 10 laryngeal [control] speakers) produced sentences in which accent was cued by the preceding context. For each utterance, a group of listeners identified which word conveyed accent. All speakers were able to convey accent. Acoustic analyses showed that some alaryngeal speakers had little or no control over fundamental frequency. Contrary to expectation, these speakers did not compensate by using nonmelodic cues, whereas speakers using F0 did use nonmelodic cues. Thus, temporal and intensity cues are concomitant with the use of F0; if F0 is affected, these nonmelodic cues will be as well. A pitch perception experiment confirmed that alaryngeal speakers who had no control over F0 and who did not use nonmelodic cues were nevertheless able to produce pitch movements. Speakers with no control over F0 apparently relied on an alternative pitch system to convey accents and other pitch movements.


1988 ◽  
Vol 53 (1) ◽  
pp. 23-29 ◽  
Author(s):  
Jack Gandour ◽  
Bernd Weinberg ◽  
Soranee Holasuit Petty ◽  
Rochana Dardarananda

The perception and production of linguistic tone was investigated in utterances spoken by Thai alaryngeal speakers. Thai is a tone language with five phonemic tones. High-quality tape recordings of five monosyllabic words produced by 2 esophageal, 1 electrolaryngeal, and 5 normal, native Thai speakers were subjected to perceptual and acoustic analysis. Results from the phonemic identification tests indicated that tones produced by alaryngeal speakers were not only perceived at much lower levels of accuracy than those produced by normal speakers, but the patterns of tonal confusions for alaryngeal speakers were also dissimilar to those for normal speakers. Results from fundamental frequency (F o ) analysis revealed that the performance deficit of alaryngeal speakers could be related to specific characteristics of their F o contours. Findings are interpreted to highlight the importance of (a) language, (b) type of prosody, (c) form of alaryngeal speech, and (d) F o level and direction on linguistic assessments of F o control in alaryngeal speech.


1973 ◽  
Vol 38 (1) ◽  
pp. 111-118 ◽  
Author(s):  
Bernd Weinberg ◽  
Jan Westerhouse

Pharyngeal speech represents one of several types of alaryngeal speech; however, its use as a primary method of communication is rare. This report relates the principal findings of an intensive study of a 12-year-old girl with laryngeal papillomatosis who has used pharyngeal speech as an exclusive method of oral communication since age two. The unique physiologic mechanisms of pharyngeal speech are described and differentiated from other forms of alaryngeal speech. This girl’s reduced pharyngeal speech intelligibility for consonant and vowel rhyme-test words, her unfavorable phonation time and maximum phonation duration characteristics, her low average fundamental frequency, and her markedly hoarse pharyngeal voice quality all are distinct vocal liabilities. These findings lend strong support to the hypothesis that pharyngeal speech should not be regarded as a desirable or practical primary method of alaryngeal speech.


2010 ◽  
Vol 61 (1) ◽  
pp. 57-61 ◽  
Author(s):  
Allam Mousa

Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-SamplingVoice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.


2014 ◽  
Vol 2014 ◽  
pp. 1-13 ◽  
Author(s):  
Jagannath Nirmal ◽  
Suprava Patnaik ◽  
Mukesh Zaveri ◽  
Pramod Kachare

The complex cepstrum vocoder is used to modify the speaker specific characteristics of the source speaker speech to that of the target speaker speech. The low time and high time liftering are used to split the calculated cepstrum into the vocal tract and the source excitation parameters. The obtained mixed phase vocal tract and source excitation parameters with finite impulse response preserve the phase properties of the resynthesized speech frame. The radial basis function is explored to capture the nonlinear mapping function for modifying the complex cepstrum based real and imaginary components of the vocal tract and source excitation of the speech signal. The state-of-the-art Mel cepstrum envelope and the fundamental frequency (F0) are considered to represent the vocal tract and the source excitation of the speech frame, respectively. Radial basis function is used to capture and formulate the nonlinear relations between the Mel cepstrum envelope of the source and target speakers. Mean and standard deviation approach is employed to modify the fundamental frequency (F0). The Mel log spectral approximation filter is used to reconstruct the speech signal from the modified Mel cepstrum envelope and fundamental frequency. A comparison of the proposed complex cepstrum based model has been made with the state-of-the-art Mel Cepstrum Envelope based voice conversion model with objective and subjective evaluations. The evaluation measures reveal that the proposed complex cepstrum based voice conversion system approximate the converted speech signal with better accuracy than the model based on the Mel cepstrum envelope based voice conversion.


Sign in / Sign up

Export Citation Format

Share Document