Advances in Computer Speech Synthesis and Implications for Assistive Technology

Author(s):  
H. Timothy Bunnell ◽  
Christopher A. Pennington

The authors review developments in Computer Speech Synthesis (CSS) over the past two decades, focusing on the relative advantages as well as disadvantages of the two dominant technologies: rule-based synthesis; and data-based synthesis. Based on this discussion, they conclude that data-based synthesis is presently the best technology for use in Speech Generating Devices (SGDs) used as communication aids. They examine the benefits associated with data-based synthesis such as personal voices, greater intelligibility and improved naturalness, discuss problems that are unique to data-based synthesis systems, and highlight areas where all types of CSS need to be improved for use in assistive devices. Much of this discussion will be from the perspective of the ModelTalker project, a data-based CSS system for voice banking that provides practical, affordable personal synthetic voices for people using SGDs to communicate. The authors conclude with consideration of some emerging technologies that may prove promising in future SGDs.

Author(s):  
Barbara J. Kouba ◽  
Brian Newberry

Even though the term is relatively new, assistive technologies of various types have helped people overcome, achieve, and perform for many years and come in many forms. In fact, many familiar technologies, some that might even be considered mainstream, were in fact initially conceived as assistive devices. Recently, assistive technology has become the subject of legislation including the Rehabilitation Act and the Americans with Disabilities Act and much more legislation regarding access to and funding for assistive technology is expected. Currently, much attention in the area of assistive technology focuses on the computer, and communications technology, including portable devices, which help individuals use powerful tools for accessing information and communicating with others. The future of assistive technology certainly will continue these areas of development but will also likely begin to adopt newer methods for interfacing various assistive technologies directly with the human sensory system. As has happened in the past, it is expected that many technologies initially created as assistive will be adopted by non-disabled individuals.


2005 ◽  
Vol 40 ◽  
pp. 19-32
Author(s):  
Sascha Fagel

The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech. Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.  


2021 ◽  
Author(s):  
Bulat Zagidullin ◽  
Ziyan Wang ◽  
Yuanfang Guan ◽  
Esa Pitkänen ◽  
Jing Tang

Application of machine and deep learning (ML/DL) methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel DL solutions in relation to established techniques. To this end we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high throughput screening studies, comprising 64,200 unique combinations of 4,153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular fingerprints and quantify their similarity by adapting Centred Kernel Alignment metric. Our work demonstrates that in order to identify an optimal representation type it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.


2020 ◽  
Vol 12 (1) ◽  
pp. 113-121
Author(s):  
Carla Piazzon Ramos Vieira ◽  
Luciano Antonio Digiampietri

The technologies supporting Artificial Intelligence (AI) have advanced rapidly over the past few years and AI is becoming a commonplace in every aspect of life like the future of self-driving cars or earlier health diagnosis. For this to occur shortly, the entire community stands in front of the barrier of explainability, an inherent problem of latest models (e.g. Deep Neural Networks) that were not present in the previous hype of AI (linear and rule-based models). Most of these recent models are used as black boxes without understanding partially or even completely how different features influence the model prediction avoiding algorithmic transparency. In this paper, we focus on how much we can understand the decisions made by an SVM Classifier in a post-hoc model agnostic approach. Furthermore, we train a tree-based model (inherently interpretable) using labels from the SVM, called secondary training data to provide explanations and compare permutation importance method to the more commonly used measures such as accuracy and show that our methods are both more reliable and meaningful techniques to use. We also outline the main challenges for such methods and conclude that model-agnostic interpretability is a key component in making machine learning more trustworthy.


2019 ◽  
Vol 75 (1) ◽  
pp. 122-125
Author(s):  
Vanessa Ratten

Purpose Tourism entrepreneurship is an emerging area of study that has both practical and theoretical importance. This paper aims to review past research on tourism entrepreneurship with the view of highlighting neglected areas of study. Design/methodology/approach A review of the past 75 years is conducted that highlights the gaps in the research in need of further research. Findings There is a focus on lifestyle and sustainable forms of tourism entrepreneurship without taking into account emerging technologies and other forms of entrepreneurship such as digital and societal. Originality/value This paper places emphasis on the transdisciplinary nature of tourism entrepreneurship that enables researchers to build on multiple disciplines to derive fruitful new areas of research interest.


1986 ◽  
Vol 22 (2) ◽  
pp. 331-354 ◽  
Author(s):  
Ken Lodge

As part of an investigation into rapid speech and its rule-based processes, I want to present an analysis of colloquial spoken Thai and show how different tempi can be related to one another. I also want to see whether the processes displayed by colloquial Thai fit into the general picture of phonological processes which has emerged over the past 15 years or so (roughly Stampe, 1969, onwards) within different theoretical frameworks. In particular I shall try to relate my findings to the increasingly accepted notions of richer phonological structure now being envisaged (e.g. Clements & Keyser, 1983 – tridimensional; Goldsmith, 1976 a & b – autosegmental; Liberman & Prince, 1977 and Kiparsky, 1979 – metrical; Anderson & Ewen, 1980, and Durand, 1986 a – dependency).


2015 ◽  
Vol 64 (2) ◽  
Author(s):  
Mark A. Andor ◽  
Manuel Frondel ◽  
Stephan Sommer

AbstractIn Europe’s Emission Trading System (ETS), prices for emission permits have remained low for many years now. This fact gave rise to controversies on whether there is a need for fundamentally reforming the ETS. Potential reform proposals include the introduction of a minimum price for certificates and a market stability reserve (MSR). This is a rule-based mechanism to steering the volume of permits in the market. While preparing the introduction of this instrument, the European Commission hopes to be able to increase and stabilize certificate prices in the medium- and long-term. In this article, we recommend retaining the ETS as it is, rather than supplementing it by introducing a minimum price floor or a market stability reserve. Instead, mistakes from the past should be corrected by a single intervention: the final elimination of those 900 million permits that were taken out of the market in 2014, but would again emerge in the market in 2019 and 2020 (backloading).


Author(s):  
Marvin Coto-Jiménez ◽  
John Goddard-Close

Recent developments in speech synthesis have produced systems capable of producing speech which closely resembles natural speech, and researchers now strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. Speech synthesis based on Hidden Markov Models (HMM) is of great interest to researchers, due to its ability to produce sophisticated features with a small footprint. Despite some progress, its quality has not yet reached the level of the current predominant unit-selection approaches, which select and concatenate recordings of real speech, and work has been conducted to try to improve HMM-based systems. In this paper, we present an application of long short-term memory (LSTM) deep neural networks as a postfiltering step in HMM-based speech synthesis. Our motivation stems from a similar desire to obtain characteristics which are closer to those of natural speech. The paper analyzes four types of postfilters obtained using five voices, which range from a single postfilter to enhance all the parameters, to a multi-stream proposal which separately enhances groups of parameters. The different proposals are evaluated using three objective measures and are statistically compared to determine any significance between them. The results described in the paper indicate that HMM-based voices can be enhanced using this approach, specially for the multi-stream postfilters on the considered objective measures.


Sign in / Sign up

Export Citation Format

Share Document