Listeners’ preference for computer-synthesized speech over natural speech of people with disabilities.

Abstract The matter of shadowing natural speech has been discussed in many studies and papers. However, there is very little knowledge of human phonetical convergence to synthesized speech. To find out more about this issue an experiment in the Polish language was conducted. Two types of stimuli were used – natural speech and synthesised speech. Five sets of sentences with various phonetic phenomena in Polish were prepared. A group of twenty persons were recorded which gave the total number of 100 samples for each phenomenon. The summary of results shows convergence in both natural and synthesised speech in set number 1, 2, 4 while in group 3 and 5 the convergence was not observed. The baseline production shown that the great majority of participants prefer ɛn/ɛm version of phonetic feature which was reflected in 83 out of 100 sentences. In the shadowing natural speech participants changed ɛn/ɛm to ɛw/ɛ̃ in 26 cases and in 4 ɛw/ɛ̃ to ɛn/ɛm. When shadowing synthesised speech shift from ɛn/ɛm to ɛw/ɛ̃ in 18 sentences and 4 from ɛw/ɛ̃ to ɛn/ɛm. The intonation convergence was also observed in the perceptual analysis, however the analysis of F0 statistics did not show statistically significant differences.

Download Full-text

The Intelligibility of Synthesized Speech

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3003.425 ◽

1987 ◽

Vol 30 (3) ◽

pp. 425-431 ◽

Cited By ~ 56

Author(s):

Julia Hoover ◽

Joe Reichle ◽

Dianne Van Tasell ◽

David Cole

Keyword(s):

Speech Synthesis ◽

Linear Trend ◽

Language Impairments ◽

Practice Effect ◽

Natural Speech ◽

Communication Aids ◽

Sentence Condition ◽

Synthesized Speech ◽

Preceding Context ◽

Low Probability

The intelligibility of two speech synthesizers [ECHO II (Street Electronics, 1982) and VOTRAX (VOTRAX Division, 1981)] was compared to the intelligibility of natural speech in each of three different contextual conditions: (a) single words, (b)"low-probability sentences" in which the last word could not be predicted from preceding context, and (c) "high-probability sentences" in which the last word could be predicted from preceding context. Additionally, the effect of practice on performance in each condition was examined. Natural speech was more intelligible than either type of synthesized speech regardless of word/sentence condition. In both sentence conditions, VOTRAX speech was significantly more intelligible than ECHO II speech. No practice effect was observed for VOTRAX, while an ascending linear trend occurred for ECHO II. Implications for the use of inexpensive speech synthesis units as components of augmentative communication aids for persons with severe speech and/or language impairments are discussed.

Download Full-text

A Comparison of Learning Curves in Natural and Synthesized Speech Comprehension

Journal of Speech Language and Hearing Research ◽

10.1044/1092-4388(2002/065) ◽

2002 ◽

Vol 45 (4) ◽

pp. 802-810 ◽

Cited By ~ 11

Author(s):

Mary E. Reynolds ◽

Charlene Isaacs-Duvall ◽

Michelle Lynn Haddox

Keyword(s):

Young Adults ◽

Human Factors ◽

Learning Curves ◽

Natural Speech ◽

Speech Comprehension ◽

Verification Task ◽

High Quality ◽

Response Latencies ◽

Synthesized Speech

This study examined the effect of listening practice on the ability of young adults to comprehend natural speech and DECtalk synthesized speech by having them perform a sentence verification task over a 5-day period. Results showed that response latencies of participants shortened in a similar fashion to sentences presented in both types of speech across the 5-day period, with latencies remaining significantly longer in response to DECtalk than to natural speech across the days. These results suggest that high-quality synthesized speech, such as DECtalk, can be useful in many human factors applications.

Download Full-text

Language-Impaired Children's Comprehension of Synthesized Speech

Language Speech and Hearing Services in Schools ◽

10.1044/0161-1461.1904.401 ◽

1988 ◽

Vol 19 (4) ◽

pp. 401-409 ◽

Cited By ~ 16

Author(s):

Holly J. Massey

Keyword(s):

Task Complexity ◽

Natural Speech ◽

Control Group ◽

Language Impaired ◽

Control Subjects ◽

Synthesized Speech ◽

Token Test ◽

Impaired Children ◽

Experimental Group ◽

Age And Sex

The Token Test for Children was given in a synthesized-speech version and a natural-speech version to 11 language-impaired children aged 8 years, 9 months to 10 years, 1 month and to 11 control subjects matched for age and sex. The scores of the language-impaired children on the synthesized version were significantly lower than (a) the synthesized-speech scores of the control group and (b) their own scores on the natural-speech version. Task complexity was a significant factor for the experimental group. Language-impaired children may have difficulty understanding some synthesized voice commands.

Download Full-text

Digital Speech Technology

Computer Synthesized Speech Technologies ◽

10.4018/978-1-61520-725-1.ch003 ◽

2010 ◽

pp. 28-49 ◽

Cited By ~ 1

Author(s):

H. S. Venkatagiri

Keyword(s):

People With Disabilities ◽

General Purpose ◽

Research Needs ◽

Text To Speech ◽

Speech Technology ◽

Expressive Communication ◽

Technical Requirements ◽

Synthesized Speech ◽

Recent Advances

Speech generating devices (SGDs) – both dedicated devices as well as general purpose computers with suitable hardware and software – are important to children and adults who might otherwise not be able to communicate adequately through speech. These devices generate speech in one of two ways: they play back speech that was recorded previously (digitized speech) or synthesize speech from text (text-to-speech or TTS synthesis). This chapter places digitized and synthesized speech within the broader domain of digital speech technology. The technical requirements for digitized and synthesized speech are discussed along with recent advances in improving the accuracy, intelligibility, and naturalness of synthesized speech. The factors to consider in selecting digitized and synthesized speech for augmenting expressive communication abilities in people with disabilities are also discussed. Finally, the research needs in synthesized speech are identified.

Download Full-text

A Measure of Smoothness in Synthesized Speech

REV Journal on Electronics and Communications ◽

10.21553/rev-jec.106 ◽

2016 ◽

Cited By ~ 1

Author(s):

Phung Trung Nghia ◽

Nguyen Van Tao ◽

Pham Thi Mai Huong ◽

Nguyen Thi Bich Diep ◽

Phung Thi Thu Hien

Keyword(s):

Speech Production ◽

Natural Speech ◽

Spectral Features ◽

Synthesized Speech ◽

Speech Features

The articulators typically move smoothly during speech production. Therefore, speech features of natural speech are generally smooth. However, over-smooth causes the “muffleness" and the reduction in identification emotions / expressions / styles in synthesized speech that can affect to the perception of the naturalness in synthesized speech. In the literature, statistical variances of static spectral features have been used as a measure of smoothness in synthesized speech but they are not sufficient enough. This paper aims to propose a speech smoothness measure that can be efficiently applied to evaluate the smoothness of synthesized speech. Experiments show that the proposed measures are reliable and efficient to measure smoothness of different kinds of synthesized speech.

Download Full-text

Predicting the Quality of Synthesized and Natural Speech Impaired by Packet Loss and Coding Using PESQ and P.563 Models

Acta Acustica united with Acustica ◽

10.3813/aaa.918465 ◽

2011 ◽

Vol 97 (5) ◽

pp. 852-868 ◽

Cited By ~ 7

Author(s):

Peter Počta ◽

Jan Holub

Keyword(s):

Packet Loss ◽

Speech Quality ◽

The Other ◽

Natural Speech ◽

Text To Speech ◽

Synthesized Speech ◽

Subjective Assessments ◽

Almost All ◽

The Impact

This paper investigates the impact of independent and dependent losses and coding on speech quality predictions provided by PESQ (also known as ITU-T P.862) and P.563 models, when both naturally-produced and synthesized speech are used. Two synthesized speech samples generated with two different Text-to-Speech systems and one naturally-produced sample are investigated. In addition, we assess the variability of PESQ's and P.563's predictions with respect to the type of speech used (naturally-produced or synthesized) and loss conditions as well as their accuracy, by comparing the predictions with subjective assessments. The results show that there is no difference between the impact of packet loss on naturally-produced speech and synthesized speech. On the other hand, the impact of coding is different for the two types of stimuli. In addition, synthesized speech seems to be insensitive to degradations provided by most of the codecs investigated here. The reasons for those findings are particularly discussed. Finally, it is concluded that both models are capable of predicting the quality of transmitted synthesized speech under the investigated conditions to a certain degree. As expected, PESQ achieves the best performance over almost all of the investigated conditions.

Download Full-text

Speech-Language Pathology in the Peace Corps: Necessity and Sustainability

Perspectives on Global Issues in Communication Sciences and Related Disorders ◽

10.1044/gics2.1.11 ◽

2012 ◽

Vol 2 (1) ◽

pp. 11-18

Author(s):

Melissa A. Pierce

Keyword(s):

United States ◽

Special Education ◽

Peace Corps ◽

People With Disabilities ◽

Communication Disorders ◽

The United States ◽

Speech Language Pathology ◽

Speech Language Pathologists ◽

Language Pathology

In countries other than the United States, the study and practice of speech-language pathology is little known or nonexistent. Recognition of professionals in the field is minimal. Speech-language pathologists in countries where speech-language pathology is a widely recognized and respected profession often seek to share their expertise in places where little support is available for individuals with communication disorders. The Peace Corps offers a unique, long-term volunteer opportunity to people with a variety of backgrounds, including speech-language pathologists. Though Peace Corps programs do not specifically focus on speech-language pathology, many are easily adapted to the profession because they support populations of people with disabilities. This article describes how the needs of local children with communication disorders are readily addressed by a Special Education Peace Corps volunteer.

Download Full-text

NYC Exhibition Showcases Products By and For People With Disabilities

Blog post Digital Object Group ◽

10.1044/access ◽

2020 ◽

Keyword(s):

People With Disabilities

Download Full-text

Perception of Distorted "R" Sounds in the Synthesized Speech of Chlldren and Adults

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.2604.516 ◽

1983 ◽

Vol 26 (4) ◽

pp. 516-524 ◽

Cited By ~ 6

Author(s):

Donald J. Sharf ◽

Ralph N. Ohde

Keyword(s):

Speech Language Pathology ◽

Graduate Level ◽

Point Scale ◽

Synthesized Speech ◽

Language Pathology ◽

High Degree

Adult and Child manifolds were generated by synthesizing 5 X 5 matrices of/Cej/ type utterances in which F2 and F3 frequencies were systematically varied. Manifold stimuli were presented to 11 graduate-level speech-language pathology students in two conditions: (a) a rating condition in which stimuli were rated on a 4-point scale between good /r/and good /w/; and (b) a labeling condition in which stimuli were labeled as "R," "W," "distorted R." or "N" (for none of the previous choices). It was found that (a) stimuli with low F2 and high F3 frequencies were rated 1.0nmdas;1.4; those with high F2 and low F3 frequencies were rated 3.6–4.0, and those with intermediate values were rated 1.5–3.5; (b) stimuli rated 1.0–1.4 were labeled as "W" and stimuli rated 3.6–4.0 were labeled as "R"; (c) none of the Child manifold stimuli were labeled as distorted "R" and one of the Adult manifold stimuli approached a level of identification that approached the percentage of identification for "R" and "W": and (d) rating and labeling tasks were performed with a high degree of reliability.

Download Full-text