Intra-Sentential Speaking Rate Control in Neural Text-To-Speech for Automatic Dubbing

Treatment programs of four improving ataxic dysarthric speakers are reviewed. Treatment sequences were based on two overall measures of speech performance—intelligibility and prosody. Increases in intelligibility were initially achieved by control of speaking rate. A hierarchy of rate control strategies, ranging from a rigid imposition of rate through rhythmic cueing to self-monitored rate control is discussed. As speakers improved their monitoring skills, a compromise was made between intelligibility and rate. Normal prosodic patterns were not achieved by the ataxic speakers due to difficulty in precisely coordinating the subtle fundamental frequency, loudness and timing adjustments needed to signal stress. Three of the four subjects were taught to use only durational adjustments to signal stress. In this way, they were able to achieve stress on targeted words consistently and minimize bizarreness which resulted from sweeping changes in fundamental frequency and bursts of loudness. The need for further clinically oriented research is discussed.

Download Full-text

Speaking rate control based on time-scale modification and its effects on the performance of speech recognition

International Journal of Engineering Systems Modelling and Simulation ◽

10.1504/ijesms.2014.058421 ◽

2014 ◽

Vol 6 (1/2) ◽

pp. 31

Author(s):

Jin Ah Kang ◽

Seung Ho Choi

Keyword(s):

Speech Recognition ◽

Rate Control ◽

Time Scale ◽

Speaking Rate ◽

Time Scale Modification

Download Full-text

Stutterers' Self-Ratings of Speech Naturalness

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3202.419 ◽

1989 ◽

Vol 32 (2) ◽

pp. 419-431 ◽

Cited By ~ 26

Author(s):

Roger J. Ingham ◽

Janis Costello Ingham ◽

Mark Onslow ◽

Patrick Finn

Keyword(s):

Rate Control ◽

Speaking Rate ◽

Single Subject ◽

Speech Sounds ◽

Therapy Program ◽

Speech Naturalness

Using single-subject experiments with 3 adult stutterers, this study evaluated the effects of instructions to stutterers to rate and modify how natural their speech sounds on experimenters' ratings of speech naturalness, stuttering frequency, and speaking rate. The study also included an investigation of the reliability of stutterers' and listeners' naturalness ratings. The stutterers were partway through a therapy program using prolonged speech or rate control. Results showed the stutterers could modify their speech so that their naturalness ratings increased or decreased. These changes were independent of stuttering or speaking rate. Experimenter ratings of speech naturalness were unchanged in conditions where stutterers judged their speech to sound more natural, but paralleled the stutterers' ratings when they judged their speech to sound more unnatural. An attempt to see if stutterers differed in their ratings of how natural their speech sounded or felt showed differences for one stutterer. Reratings of randomized session recordings by experimenters and independent judges showed that their ratings were highly reliable. When the same randomized session recordings were rerated by the stutterers (1 and 3 months after the experiment), their judgments of changes in their speech naturalness, which were not found in the experimenters' ratings, remained consistent and reliable.

Download Full-text

A Study on Affecting Factors of Speaking Rate Control Methods Used in Dysarthria Rehabilitation (1st Report)

The Japan Journal of Logopedics and Phoniatrics ◽

10.5112/jjlp.53.302 ◽

2012 ◽

Vol 53 (4) ◽

pp. 302-311

Author(s):

Eiji Shimura ◽

Kazuhiko Kakehi

Keyword(s):

Rate Control ◽

Speaking Rate ◽

Control Methods ◽

Affecting Factors

Download Full-text

Effect of Text-to-Speech Rate on Reading Comprehension by Adults With Aphasia

American Journal of Speech-Language Pathology ◽

10.1044/2019_ajslp-19-00047 ◽

2020 ◽

Vol 29 (1) ◽

pp. 168-184 ◽

Cited By ~ 1

Author(s):

Karen Hux ◽

Jessica A. Brown ◽

Sarah Wallace ◽

Kelly Knollman-Porter ◽

Anna Saylor ◽

...

Keyword(s):

Performance Feedback ◽

Fast Rate ◽

Speech Rate ◽

Speaking Rate ◽

Auditory Presentation ◽

Text To Speech ◽

Speech Output ◽

Slow Presentation ◽

Listening Strategies ◽

Comprehension Accuracy

Purpose Accessing auditory and written material simultaneously benefits people with aphasia; however, the extent of benefit as well as people's preferences and experiences may vary given different auditory presentation rates. This study's purpose was to determine how 3 text-to-speech rates affect comprehension when adults with aphasia access newspaper articles through combined modalities. Secondary aims included exploring time spent reviewing written texts after speech output cessation, rate preference, preference consistency, and participant rationales for preferences. Method Twenty-five adults with aphasia read and listened to passages presented at slow (113 words per minute [wpm]), medium (154 wpm), and fast (200 wpm) rates. Participants answered comprehension questions, selected most and least preferred rates following the 1st and 3rd experimental sessions and after receiving performance feedback, and explained rate preferences and reading and listening strategies. Results Comprehension accuracy did not vary significantly across presentation rates, but reviewing time after cessation of auditory content did. Visual data inspection revealed that, in particular, participants with substantial extra reviewing time took longer given fast than medium or slow presentation. Regardless of exposure amount or receipt of performance feedback, participants most preferred the medium rate and least preferred the fast rate; rationales centered on reading and listening synchronization, benefits to comprehension, and perceived normality of speaking rate. Conclusion As a group, people with aphasia most preferred and were most efficient given a text-to-speech rate around 150 wpm when processing dual modality content; individual differences existed, however, and mandate attention to personal preferences and processing strengths.

Download Full-text

Effects of Speech Rate and Pitch Contour on the Perception of Synthetic Speech

Human Factors The Journal of the Human Factors and Ergonomics Society ◽

10.1177/001872088502700609 ◽

1985 ◽

Vol 27 (6) ◽

pp. 701-712 ◽

Cited By ~ 38

Author(s):

Louisa M. Slowiaczek ◽

Howard C. Nusbaum

Keyword(s):

Syntactic Structure ◽

Speech Rate ◽

Speaking Rate ◽

Pitch Contour ◽

Synthetic Speech ◽

Systematic Evaluation ◽

Text To Speech ◽

Synthesized Speech ◽

Response Systems

The increased use of voice-response systems has resulted in a greater need for systematic evaluation of the role of segmental and suprasegmental factors in determining the intelligibility of synthesized speech. Two experiments were conducted to examine the effects of pitch contour and speech rate on the perception of synthetic speech. In Experiment 1, subjects transcribed sentences that were either syntactically correct and meaningful or syntactically correct but semantically anomalous. In Experiment 2, subjects transcribed sentences that varied in length and syntactic structure. In both experiments a text-to-speech system generated synthetic speech at either 150 or 250 words/min. Half of the test sentences were generated with a flat pitch (monotone) and half were generated with normally inflected clausal intonation. The results indicate that the identification of words in fluent synthetic speech is influenced by speaking rate, meaning, length, and, to a lesser degree, pitch contour. The results suggest that in many applied situations the perception of the segmental information in the speech signal may be more critical to the intelligibility of synthesized speech than are suprasegmental factors.

Download Full-text

Automated Acoustic Analysis of Oral Diadochokinesis to Assess Bulbar Motor Involvement in Amyotrophic Lateral Sclerosis

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-19-00178 ◽

2020 ◽

Vol 63 (1) ◽

pp. 59-73 ◽

Cited By ~ 3

Author(s):

Panying Rong

Keyword(s):

Amyotrophic Lateral Sclerosis ◽

Acoustic Analysis ◽

Speaking Rate ◽

Disease Stage ◽

Healthy Controls ◽

Tongue Movement ◽

Major Barrier ◽

Syllable Repetition ◽

Oral Diadochokinesis ◽

Lateral Sclerosis

Purpose The purpose of this article was to validate a novel acoustic analysis of oral diadochokinesis (DDK) in assessing bulbar motor involvement in amyotrophic lateral sclerosis (ALS). Method An automated acoustic DDK analysis was developed, which filtered out the voice features and extracted the envelope of the acoustic waveform reflecting the temporal pattern of syllable repetitions during an oral DDK task (i.e., repetitions of /tɑ/ at the maximum rate on 1 breath). Cycle-to-cycle temporal variability (cTV) of envelope fluctuations and syllable repetition rate (sylRate) were derived from the envelope and validated against 2 kinematic measures, which are tongue movement jitter (movJitter) and alternating tongue movement rate (AMR) during the DDK task, in 16 individuals with bulbar ALS and 18 healthy controls. After the validation, cTV, sylRate, movJitter, and AMR, along with an established clinical speech measure, that is, speaking rate (SR), were compared in their ability to (a) differentiate individuals with ALS from healthy controls and (b) detect early-stage bulbar declines in ALS. Results cTV and sylRate were significantly correlated with movJitter and AMR, respectively, across individuals with ALS and healthy controls, confirming the validity of the acoustic DDK analysis in extracting the temporal DDK pattern. Among all the acoustic and kinematic DDK measures, cTV showed the highest diagnostic accuracy (i.e., 0.87) with 80% sensitivity and 94% specificity in differentiating individuals with ALS from healthy controls, which outperformed the SR measure. Moreover, cTV showed a large increase during the early disease stage, which preceded the decline of SR. Conclusions This study provided preliminary validation of a novel automated acoustic DDK analysis in extracting a useful measure, namely, cTV, for early detection of bulbar ALS. This analysis overcame a major barrier in the existing acoustic DDK analysis, which is continuous voicing between syllables that interferes with syllable structures. This approach has potential clinical applications as a novel bulbar assessment.

Download Full-text

A Review of 21 iPad Applications for Augmentative and Alternative Communication Purposes

Perspectives on Augmentative and Alternative Communication ◽

10.1044/aac21.2.60 ◽

2012 ◽

Vol 21 (2) ◽

pp. 60-71 ◽

Cited By ~ 24

Author(s):

Ashley Alliano ◽

Kimberly Herriger ◽

Anthony D. Koutsoftas ◽

Theresa E. Bartolotta

Keyword(s):

Augmentative And Alternative Communication ◽

Cost Effective ◽

Alternative Form ◽

Alternative Communication ◽

Text To Speech ◽

Reference Guide ◽

Expressive Communication ◽

Communication Needs ◽

User Friendly ◽

Ipad Applications

Abstract Using the iPad tablet for Augmentative and Alternative Communication (AAC) purposes can facilitate many communicative needs, is cost-effective, and is socially acceptable. Many individuals with communication difficulties can use iPad applications (apps) to augment communication, provide an alternative form of communication, or target receptive and expressive language goals. In this paper, we will review a collection of iPad apps that can be used to address a variety of receptive and expressive communication needs. Based on recommendations from Gosnell, Costello, and Shane (2011), we describe the features of 21 apps that can serve as a reference guide for speech-language pathologists. We systematically identified 21 apps that use symbols only, symbols and text-to-speech, and text-to-speech only. We provide descriptions of the purpose of each app, along with the following feature descriptions: speech settings, representation, display, feedback features, rate enhancement, access, motor competencies, and cost. In this review, we describe these apps and how individuals with complex communication needs can use them for a variety of communication purposes and to target a variety of treatment goals. We present information in a user-friendly table format that clinicians can use as a reference guide.

Download Full-text

Managing Speaking Rate in Dysarthria

Perspectives on Neurophysiology and Neurogenic Speech and Language Disorders ◽

10.1044/nnsld12.4.17 ◽

2002 ◽

Vol 12 (4) ◽

pp. 17-21 ◽

Cited By ~ 1

Author(s):

Vicki L. Hammen

Keyword(s):

Speaking Rate

Download Full-text