Perception of Standard Arabic Synthetic Speech Rate

Author(s):  
Yahya Aldholmi ◽  
Rawan Aldhafyan ◽  
Asma Alqahtani
2021 ◽  
Vol 14 (3) ◽  
pp. 1-26
Author(s):  
Danielle Bragg ◽  
Katharina Reinecke ◽  
Richard E. Ladner

As conversational agents and digital assistants become increasingly pervasive, understanding their synthetic speech becomes increasingly important. Simultaneously, speech synthesis is becoming more sophisticated and manipulable, providing the opportunity to optimize speech rate to save users time. However, little is known about people’s abilities to understand fast speech. In this work, we provide an extension of the first large-scale study on human listening rates, enlarging the prior study run with 453 participants to 1,409 participants and adding new analyses on this larger group. Run on LabintheWild, it used volunteer participants, was screen reader accessible, and measured listening rate by accuracy at answering questions spoken by a screen reader at various rates. Our results show that people who are visually impaired, who often rely on audio cues and access text aurally, generally have higher listening rates than sighted people. The findings also suggest a need to expand the range of rates available on personal devices. These results demonstrate the potential for users to learn to listen to faster rates, expanding the possibilities for human-conversational agent interaction.


1989 ◽  
Vol 33 ◽  
pp. 89-94
Author(s):  
Hugo Quené

Text-to-speech systems generally consist of two components. The first one converts the input text to an abstract, linguistically relevant, representation. Usually, this is a phoneme representation of the input text, with markers for (word, morpheme, syllable) boundaries, word stress, and sentence accent. The second component converts this transcription into a physical speech sound. Two aspects of natural speech are most important to be imitated in this latter step: (a) natural prosody (speech rate, segment duration, pitch, etc.), and (b) representation of phonetic adjustement between phonemes. The resulting synthetic speech is mainly used in special-purpose applications, although a wider use is foreseen for the future.


1987 ◽  
Vol 31 (9) ◽  
pp. 961-965 ◽  
Author(s):  
Monica A. Merva ◽  
Beverly H. Williges

Two studies were conducted to explore the effects of various parameters on rule-based synthetic speech intelligibility. Experiment I examined the effect of situational context clues and speech rate on synthesized speech intelligibility. Subjects who received pragmatic context information prior to each message had transcription error rates 50% lower than those who received no context information. Speech rates of 250 words per minute (wpm) yielded significantly more transcription errors than rates of 180 wpm. In Experiment II, the effects of speech rate, message repetition, and location of information in a message were examined. Transcription accuracy was best for messages spoken at 150 or 180 wpm and for messages repeated either twice or three times. Words at the end of messages were transcribed more accurately than words at the beginning of messages. Subjective ratings indicated that subjects were aware of errors when incorrectly transcribing a message even though no feedback was provided.


Author(s):  
Louisa M. Slowiaczek ◽  
Howard C. Nusbaum

The increased use of voice-response systems has resulted in a greater need for systematic evaluation of the role of segmental and suprasegmental factors in determining the intelligibility of synthesized speech. Two experiments were conducted to examine the effects of pitch contour and speech rate on the perception of synthetic speech. In Experiment 1, subjects transcribed sentences that were either syntactically correct and meaningful or syntactically correct but semantically anomalous. In Experiment 2, subjects transcribed sentences that varied in length and syntactic structure. In both experiments a text-to-speech system generated synthetic speech at either 150 or 250 words/min. Half of the test sentences were generated with a flat pitch (monotone) and half were generated with normally inflected clausal intonation. The results indicate that the identification of words in fluent synthetic speech is influenced by speaking rate, meaning, length, and, to a lesser degree, pitch contour. The results suggest that in many applied situations the perception of the segmental information in the speech signal may be more critical to the intelligibility of synthesized speech than are suprasegmental factors.


2019 ◽  
Vol 28 (2S) ◽  
pp. 875-886 ◽  
Author(s):  
Jennifer M. Vojtech ◽  
Jacob P. Noordzij ◽  
Gabriel J. Cler ◽  
Cara E. Stepp

Purpose This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech. Method Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli. Results Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level. Conclusions Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs. Supplemental Material https://doi.org/10.23641/asha.8847833


2010 ◽  
Vol 20 (1) ◽  
pp. 20-25 ◽  
Author(s):  
Jim Tsiamtsiouris ◽  
Kim Krieger

Abstract The purpose of this study was to test the hypothesis that adults who stutter will exhibit significant improvements after attending a residential, 3-week intensive program that focuses on avoidance reduction and stuttering modification therapy. Preliminary analyses focused on four measures: (a) SSI-3, (b) speech rate, (c) S-24 Scale, and (d) OASES. Results indicated significant improvements on all of the measures.


Author(s):  
James Dickins ◽  
Janet C. E. Watson
Keyword(s):  

2008 ◽  
Author(s):  
Kimberly M. Fenn ◽  
Daniel Margoliash ◽  
Howard C. Nusbaum

Sign in / Sign up

Export Citation Format

Share Document