Effects of Speech Rate and Pitch Contour on the Perception of Synthetic Speech

The increased use of voice-response systems has resulted in a greater need for systematic evaluation of the role of segmental and suprasegmental factors in determining the intelligibility of synthesized speech. Two experiments were conducted to examine the effects of pitch contour and speech rate on the perception of synthetic speech. In Experiment 1, subjects transcribed sentences that were either syntactically correct and meaningful or syntactically correct but semantically anomalous. In Experiment 2, subjects transcribed sentences that varied in length and syntactic structure. In both experiments a text-to-speech system generated synthetic speech at either 150 or 250 words/min. Half of the test sentences were generated with a flat pitch (monotone) and half were generated with normally inflected clausal intonation. The results indicate that the identification of words in fluent synthetic speech is influenced by speaking rate, meaning, length, and, to a lesser degree, pitch contour. The results suggest that in many applied situations the perception of the segmental information in the speech signal may be more critical to the intelligibility of synthesized speech than are suprasegmental factors.

Download Full-text

The Processing of Synthetic Speech by Older and Younger Adults

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193129203600211 ◽

1992 ◽

Vol 36 (2) ◽

pp. 190-192 ◽

Cited By ~ 3

Author(s):

Janan Al-Awar Smither

Keyword(s):

Short Term Memory ◽

Memory Task ◽

The Elderly ◽

Synthetic Speech ◽

Text To Speech ◽

Short Term ◽

Younger Adults ◽

Term Memory ◽

Response Systems ◽

Older Subjects

This experiment investigated the demands synthetic speech places on short term memory by comparing performance of old and young adults on an ordinary short term memory task. Items presented were generated by a human speaker or by a text-to-speech computer synthesizer. Results were consistent with the idea that the comprehension of synthetic speech imposes increased resource demands on the short term memory system. Older subjects performed significantly more poorly than younger subjects, and both groups performed more poorly with synthetic than with human speech. Findings suggest that short term memory demands imposed by the processing of synthetic speech should be investigated further, particularly regarding the implementation of voice response systems in devices for the elderly.

Download Full-text

Effect of Text-to-Speech Rate on Reading Comprehension by Adults With Aphasia

American Journal of Speech-Language Pathology ◽

10.1044/2019_ajslp-19-00047 ◽

2020 ◽

Vol 29 (1) ◽

pp. 168-184 ◽

Cited By ~ 1

Author(s):

Karen Hux ◽

Jessica A. Brown ◽

Sarah Wallace ◽

Kelly Knollman-Porter ◽

Anna Saylor ◽

...

Keyword(s):

Performance Feedback ◽

Fast Rate ◽

Speech Rate ◽

Speaking Rate ◽

Auditory Presentation ◽

Text To Speech ◽

Speech Output ◽

Slow Presentation ◽

Listening Strategies ◽

Comprehension Accuracy

Purpose Accessing auditory and written material simultaneously benefits people with aphasia; however, the extent of benefit as well as people's preferences and experiences may vary given different auditory presentation rates. This study's purpose was to determine how 3 text-to-speech rates affect comprehension when adults with aphasia access newspaper articles through combined modalities. Secondary aims included exploring time spent reviewing written texts after speech output cessation, rate preference, preference consistency, and participant rationales for preferences. Method Twenty-five adults with aphasia read and listened to passages presented at slow (113 words per minute [wpm]), medium (154 wpm), and fast (200 wpm) rates. Participants answered comprehension questions, selected most and least preferred rates following the 1st and 3rd experimental sessions and after receiving performance feedback, and explained rate preferences and reading and listening strategies. Results Comprehension accuracy did not vary significantly across presentation rates, but reviewing time after cessation of auditory content did. Visual data inspection revealed that, in particular, participants with substantial extra reviewing time took longer given fast than medium or slow presentation. Regardless of exposure amount or receipt of performance feedback, participants most preferred the medium rate and least preferred the fast rate; rationales centered on reading and listening synchronization, benefits to comprehension, and perceived normality of speaking rate. Conclusion As a group, people with aphasia most preferred and were most efficient given a text-to-speech rate around 150 wpm when processing dual modality content; individual differences existed, however, and mandate attention to personal preferences and processing strengths.

Download Full-text

Sprekende Computers

Toegepaste Taalwetenschap in Artikelen ◽

10.1075/ttwia.33.12que ◽

1989 ◽

Vol 33 ◽

pp. 89-94

Author(s):

Hugo Quené

Keyword(s):

Speech Rate ◽

Speech Sound ◽

Natural Speech ◽

Synthetic Speech ◽

Text To Speech ◽

Word Stress ◽

Input Text ◽

The Future ◽

Segment Duration

Text-to-speech systems generally consist of two components. The first one converts the input text to an abstract, linguistically relevant, representation. Usually, this is a phoneme representation of the input text, with markers for (word, morpheme, syllable) boundaries, word stress, and sentence accent. The second component converts this transcription into a physical speech sound. Two aspects of natural speech are most important to be imitated in this latter step: (a) natural prosody (speech rate, segment duration, pitch, etc.), and (b) representation of phonetic adjustement between phonemes. The resulting synthetic speech is mainly used in special-purpose applications, although a wider use is foreseen for the future.

Download Full-text

Context, Repetition and Synthesized Speech Intelligibility

Proceedings of the Human Factors Society Annual Meeting ◽

10.1177/154193128703100907 ◽

1987 ◽

Vol 31 (9) ◽

pp. 961-965 ◽

Cited By ~ 1

Author(s):

Monica A. Merva ◽

Beverly H. Williges

Keyword(s):

Speech Intelligibility ◽

Speech Rate ◽

Error Rates ◽

Synthetic Speech ◽

Context Information ◽

Subjective Ratings ◽

Rule Based ◽

Situational Context ◽

Synthesized Speech ◽

Transcription Error

Two studies were conducted to explore the effects of various parameters on rule-based synthetic speech intelligibility. Experiment I examined the effect of situational context clues and speech rate on synthesized speech intelligibility. Subjects who received pragmatic context information prior to each message had transcription error rates 50% lower than those who received no context information. Speech rates of 250 words per minute (wpm) yielded significantly more transcription errors than rates of 180 wpm. In Experiment II, the effects of speech rate, message repetition, and location of information in a message were examined. Transcription accuracy was best for messages spoken at 150 or 180 wpm and for messages repeated either twice or three times. Words at the end of messages were transcribed more accurately than words at the beginning of messages. Subjective ratings indicated that subjects were aware of errors when incorrectly transcribing a message even though no feedback was provided.

Download Full-text

The Effects of Modulating Fundamental Frequency and Speech Rate on the Intelligibility, Communication Efficiency, and Perceived Naturalness of Synthetic Speech

American Journal of Speech-Language Pathology ◽

10.1044/2019_ajslp-msc18-18-0052 ◽

2019 ◽

Vol 28 (2S) ◽

pp. 875-886 ◽

Cited By ~ 1

Author(s):

Jennifer M. Vojtech ◽

Jacob P. Noordzij ◽

Gabriel J. Cler ◽

Cara E. Stepp

Keyword(s):

Fundamental Frequency ◽

Slow Rate ◽

Speech Synthesis ◽

Speech Rate ◽

Synthetic Speech ◽

Normal Rate ◽

Synthesized Speech ◽

Sentence Level ◽

Communication Efficiency ◽

F0 Contour

Purpose This study investigated how modulating fundamental frequency (f0) and speech rate differentially impact the naturalness, intelligibility, and communication efficiency of synthetic speech. Method Sixteen sentences of varying prosodic content were developed via a speech synthesizer. The f0 contour and speech rate of these sentences were altered to produce 4 stimulus sets: (a) normal rate with a fixed f0 level, (b) slow rate with a fixed f0 level, (c) normal rate with prosodically natural f0 variation, and (d) normal rate with prosodically unnatural f0 variation. Sixteen listeners provided orthographic transcriptions and judgments of naturalness for these stimuli. Results Sentences with f0 variation were rated as more natural than those with a fixed f0 level. Conversely, sentences with a fixed f0 level demonstrated higher intelligibility than those with f0 variation. Speech rate did not affect the intelligibility of stimuli with a fixed f0 level. Communication efficiency was highest for sentences produced at a normal rate and a fixed f0 level. Conclusions Sentence-level f0 variation increased naturalness ratings of synthesized speech, whether the variation was prosodically natural or not. However, these f0 variations reduced intelligibility. There is evidence of a trade-off in naturalness and intelligibility of synthesized speech, which may impact future speech synthesis designs. Supplemental Material https://doi.org/10.23641/asha.8847833

Download Full-text

The role of feedback in perceptual learning of synthetic speech

PsycEXTRA Dataset ◽

10.1037/e537102012-252 ◽

2001 ◽

Author(s):

Howard Nusbaum ◽

Kimberly Fenn

Keyword(s):

Perceptual Learning ◽

Synthetic Speech

Download Full-text

THEMATIC ROLE ASSIGNMENT IN ENGLISH SENTENCES: A QUICK GLANCE AT AN INTERFACE BETWEEN SYNTAX AND SEMANTICS

ELTICS : Journal of English Language Teaching and English Linguistics ◽

10.31316/eltics.v2i1.385 ◽

2019 ◽

Vol 2 (1) ◽

Author(s):

Eni Maharsi

Keyword(s):

Syntactic Structure ◽

Thematic Roles ◽

Thematic Role ◽

Role Assignment

This paper examines the role of elements of English sentences by employing the approach ofthematic role assignment. The emphasis is on how the positioning of words and phrases insyntactic structure helps determine the roles that the referents of NPs play in the situationdescribed by the sentences. The results reveal that the position of an NP’s determines itsthematic role and. There is a relevance between deep syntactic structure and the assignmentof thematic roles for every NP in the sentence.

Download Full-text

On cross-dialect and speaker-adaptation of speaking rate-dependent hierarchical prosodic model for a Hakka text-to-speech system

10.21437/speechprosody.2016-161 ◽

2016 ◽

Author(s):

Chen-Yu Chiang ◽

Hsiu-Min Yu ◽

Sin-Horng Chen

Keyword(s):

Speaking Rate ◽

Text To Speech ◽

Rate Dependent

Download Full-text

A systematic review and meta-analysis of studies on metaphor comprehension in individuals with autism spectrum disorder: Do task properties matter?

Applied Psycholinguistics ◽

10.1017/s0142716419000328 ◽

2019 ◽

Vol 40 (6) ◽

pp. 1421-1454 ◽

Cited By ~ 8

Author(s):

Tamar Kalandadze ◽

Valentina Bambini ◽

Kari-Anne B. Næss

Keyword(s):

Systematic Review ◽

Autism Spectrum Disorder ◽

Syntactic Structure ◽

Meta Analysis ◽

Autism Spectrum ◽

Spectrum Disorder ◽

Response Format ◽

Metaphor Comprehension ◽

The Relationship

AbstractIndividuals with autism spectrum disorder (ASD) often experience difficulty in comprehending metaphors compared to individuals with typical development (TD). However, there is a large variation in the results across studies, possibly related to the properties of the metaphor tasks. This preregistered systematic review and meta-analysis (a) explored the properties of the metaphor tasks used in ASD research, and (b) investigated the group difference between individuals with ASD and TD on metaphor comprehension, as well as the relationship between the task properties and any between-study variation. A systematic search was undertaken in seven relevant databases. Fourteen studies fulfilled our predetermined inclusion criteria. Across tasks, we detected four types of response format and a great variety of metaphors in terms of familiarity, syntactic structure, and linguistic context. Individuals with TD outperformed individuals with ASD on metaphor comprehension (Hedges’ g = −0.63). Verbal explanation response format was utilized in the study showing the largest effect size in the group comparison. However, due to the sparse experimental manipulations, the role of task properties could not be established. Future studies should consider and report task properties to determine their role in metaphor comprehension, and to inform experimental paradigms as well as educational assessment.

Download Full-text

The future role of text to speech synthesis in automated services

10.1049/ic:19970799 ◽

1997 ◽

Author(s):

A.P. Breen

Keyword(s):

Speech Synthesis ◽

Text To Speech ◽

Future Role ◽

The Future ◽

Text To Speech Synthesis

Download Full-text