Pitch frequency generation system in a speech synthesis system

1992 ◽  
Vol 92 (2) ◽  
pp. 1199-1199
Author(s):  
Norio Higuchi ◽  
Seiichi Yamamoto ◽  
Toru Shimizu
2011 ◽  
Vol 145 ◽  
pp. 441-445
Author(s):  
Hung Che Shen ◽  
Chung Nan Lee

Creating a digital storytelling book is an important knowledge source for the blinds, but it usually takes a lot of time and efforts. In order to read the books from electronic contents, automatic procedures could be incorporated into a speech synthesis system. In this paper, we give a practical description using a free software Text-to-speech (TTS) program with a MIDI-to-Singing toolkit as a digital storytelling book generator. In this case, a certain amount of emotional TTS customization can be derived by using time-pitch manipulation of the synthesized acoustic waveform. MIDI-to-Singing voices can be generated automatically with special emphasis on lyrical or storytelling-styled contents that are usually discouraged by uninteresting natures of voices synthesized from traditional Text-to-speech (TTS) programs. Rule-based approaches rely on rules that describe the behavior of the pitch frequency along time to generate time-pitch values. Pitch values fluctuate within a certain range depending on the intended emotion. This MIDI-to-Singing voice synthesis relies on mapping the pitch frequency values to the 12 semi-tonal melodic scales and extracting semi-tonic intervals for each emotional state. In the current version of the system, a user can style the synthesized voice by selecting either male or female standard voice in combination with one of the predefined 12 expressive styles: Neutral, Monotonic, Lowly-pitched, Highly-pitched, Rising-pitched, Falling-pitched, Happy, Sad, Fear, Anger, Randomly-pitched, and Melody-aligning (singing) styles using a small set of musical notes. A subjective test shows that synthetic conversations based on MIDI-to-Singing with customized styles are more preferable, natural, intelligible and enjoyable than the traditional ones. Finally, the result of digital talking recordings can be heard on the web-site for the comparisons between human speech and MIDI-to-Singing synthesized speech.


Author(s):  
S.J. Eady ◽  
T.M.S. Hemphill ◽  
J.R. Woolsey ◽  
J.A.W. Clayards

1995 ◽  
Vol 18 (2) ◽  
pp. 141-158 ◽  
Author(s):  
Marshall H. Raskind ◽  
Eleanor Higgins

This study investigated the effects of speech synthesis on the proofreading efficiency of postsecondary students with learning disabilities. Subjects proofread self-generated written language samples under three conditions: (a) using a speech synthesis system that simultaneously highlighted and “spoke” words on a computer monitor, (b) having the text read aloud to them by another person, and (c) receiving no assistance. Using the speech synthesis system enabled subjects to detect a significantly higher percentage of total errors than either of the other two proofreading conditions. In addition, subjects were able to locate a significantly higher percentage of capitalization, spelling, usage and typographical errors under the speech synthesis condition. However, having the text read aloud by another person significantly outperformed the other conditions in finding “grammar-mechanical” errors. Results are discussed with regard to underlying reasons for the overall superior performance of the speech synthesis system and the implications of using speech synthesis as a compensatory writing aid for postsecondary students with learning disabilities.


Author(s):  
Jesin James ◽  
Isabella Shields ◽  
Rebekah Berriman ◽  
Peter J. Keegan ◽  
Catherine I. Watson

Sign in / Sign up

Export Citation Format

Share Document