Using an HPSG grammar for the generation of prosody

Proceedings of the International Conference on Head-Driven Phrase Structure Grammar ◽

10.21248/hpsg.2007.5 ◽

2007 ◽

Author(s):

Berthold Crysmann ◽

Philipp Von Böselager

Keyword(s):

Speech Synthesis ◽

Large Scale ◽

Synthetic Speech ◽

Test Cases ◽

Synthesis System ◽

Testing Hypotheses ◽

Syntactic Structures ◽

Interface Module ◽

Syntactic Information

In this paper, we report on an experiment showing how the introduction of prosodic information from detailed syntactic structures into synthetic speech leads to better disambiguation of structurally ambiguous sentences. Using modifier attachment (MA) ambiguities and subject/object fronting (OF) in German as test cases, we show that prosody which is automatically generated from deep syntactic information provided by an HPSG generator can lead to considerable disambiguation effects, and can even override a strong semantics-driven bias. The architecture used in the experiment, consisting of the LKB generator running a large-scale grammar for German, a syntax-prosody interface module, and the speech synthesis system MARY is shown to be a valuable platform for testing hypotheses in intonation studies.

Download Full-text

Expanding a Large Inclusive Study of Human Listening Rates

ACM Transactions on Accessible Computing ◽

10.1145/3461700 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-26

Author(s):

Danielle Bragg ◽

Katharina Reinecke ◽

Richard E. Ladner

Keyword(s):

Speech Synthesis ◽

Large Scale ◽

Speech Rate ◽

Synthetic Speech ◽

Conversational Agents ◽

Screen Reader ◽

Audio Cues ◽

Agent Interaction ◽

Personal Devices ◽

Answering Questions

As conversational agents and digital assistants become increasingly pervasive, understanding their synthetic speech becomes increasingly important. Simultaneously, speech synthesis is becoming more sophisticated and manipulable, providing the opportunity to optimize speech rate to save users time. However, little is known about people’s abilities to understand fast speech. In this work, we provide an extension of the first large-scale study on human listening rates, enlarging the prior study run with 453 participants to 1,409 participants and adding new analyses on this larger group. Run on LabintheWild, it used volunteer participants, was screen reader accessible, and measured listening rate by accuracy at answering questions spoken by a screen reader at various rates. Our results show that people who are visually impaired, who often rely on audio cues and access text aurally, generally have higher listening rates than sighted people. The findings also suggest a need to expand the range of rates available on personal devices. These results demonstrate the potential for users to learn to listen to faster rates, expanding the possibilities for human-conversational agent interaction.

Download Full-text

Text-to-Speech Synthesis

Encyclopedia of Multimedia Technology and Networking ◽

10.4018/978-1-59140-561-0.ch135 ◽

2011 ◽

pp. 957-963

Author(s):

Mahbubur R. Syed ◽

Shuvro Chakrobartty ◽

Robert J. Bignall

Keyword(s):

Speech Production ◽

Speech Synthesis ◽

Synthetic Speech ◽

Practical Application ◽

Text To Speech ◽

Synthesis System ◽

System A ◽

Vocal System ◽

Text To Speech Synthesis ◽

Computer Based

Speech synthesis is the process of producing natural-sounding, highly intelligible synthetic speech simulated by a machine in such a way that it sounds as if it was produced by a human vocal system. A text-to-speech (TTS) synthesis system is a computer-based system where the input is text and the output is a simulated vocalization of that text. Before the 1970s, most speech synthesis was achieved with hardware, but this was costly and it proved impossible to properly simulate natural speech production. Since the 1970s, the use of computers has made the practical application of speech synthesis more feasible.

Download Full-text

VOS: the Corpus-Based etnamese Text-to-Speech System

Research and Development on Information and Communication Technology ◽

10.32913/mic-ict-research.v3.n7.285 ◽

2010 ◽

Author(s):

Vo Quang Dieu Ha ◽

Nguyen Manh Tuan ◽

Cao Xuan Nam ◽

Pham Minh Nhut ◽

Vu Hai Quan

Keyword(s):

Experimental Evaluation ◽

Speech Synthesis ◽

Foreign Languages ◽

Synthetic Speech ◽

Text To Speech ◽

Synthesis System ◽

Unit Selection ◽

Southern Vietnam ◽

Selection Approach ◽

Complete Specification

This paper presents a complete specification of the Vietnamese speech synthesis system named VOS (Voice of Southern Vietnam). Due to the fact that current Vietnamese text-to-speech systems lack the naturalness of output synthetic speech, VOS is based on the unit selection approach which aims to achieve maximum naturalness. There are three main parts constituting VOS: a corpus manager, a synthesizer, and a transliteration model. Corpus manager manages automated speech indexing and segmentation for unit selection executed by the synthesizer, while transliteration model deals with the pronunciation of words in foreign languages. A comparative experimental evaluation of VnSpeech, VietVoice, and VOS is conducted using ITU-T P.85 standard. Results show that VOS outperforms the former two TTS systems.

Download Full-text

Isarn Dialect Speech Synthesis using HMM with syllable-context features

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2018122.108607 ◽

2018 ◽

Vol 12 (2) ◽

pp. 81-89

Author(s):

Pongsathon Janyoi ◽

Pusadee Seresangtakul

Keyword(s):

Speech Synthesis ◽

Markov Models ◽

Synthetic Speech ◽

Synthesis System ◽

Fundamental Frequencies ◽

Input Text ◽

Regional Dialect ◽

Context Dependent ◽

Prosody Generation ◽

Context Features

This paper describes the Isarn speech synthesis system, which is a regional dialect spoken in the Northeast of Thailand. In this study, we focus to improve the prosody generation of the system by using the additional context features. In order to develop the system, the speech parameters (Mel-ceptrum and fundamental frequencies of phoneme within different phonetic contexts) were modelled using Hidden Markov Models (HMM). Synthetic speech was generated by converting the input text into context-dependent phonemes. Speech parameters were generated from the trained HMM, according to the context-dependent phonemes, and were then synthesized through a speech vocoder. In this study, systems were trained using three different feature sets: basic contextual features, tonal, and syllable-context features. Objective and subjective tests were conducted to determine the performance of the proposed system. The results indicated that the addition of the syllable-context features significantly improved the naturalness of synthesized speech.

Download Full-text

Building HMM based unit-selection speech synthesis system using synthetic speech naturalness evaluation score

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2011.5947567 ◽

2011 ◽

Cited By ~ 3

Author(s):

Heng Lu ◽

Zhen-Hua Ling ◽

Li-Rong Dai ◽

Ren-Hua Wang

Keyword(s):

Speech Synthesis ◽

Synthetic Speech ◽

Synthesis System ◽

Unit Selection ◽

Evaluation Score ◽

Speech Naturalness

Download Full-text

Research and Realization the Method of Pronunciation Conversion for Speech Synthesis of the Lhasa Dialect of Tibetan

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.571-572.858 ◽

2014 ◽

Vol 571-572 ◽

pp. 858-862

Author(s):

Zhi Qiang Wu ◽

Hong Zhi Yu ◽

Shu Hui Wan

Keyword(s):

Speech Synthesis ◽

Synthetic Speech ◽

Synthesis System ◽

Conversion System ◽

Synthesis Research

Pronunciation conversion is the premise to realize the speech synthesis system, besides, the conversion accuracy is directly related to the quality of synthetic speech. By studying the characteristics of Tibetan words and Lhasa pronunciation, currently method of the pronunciation conversion for Tibetan dialect in Lhasa, combination the need of speech synthesis research, designed and realized the pronunciation conversion system that can be applied in the Lhasa dialect of Tibetan speech synthesis. In tests the system is up to 95.3 percent accurate, the results of conversion are basically able to meet the needs of the Tibetan speech synthesis system.

Download Full-text