Audiovisual Speech Synthesis using Tacotron2

Mapping Intimacies ◽

10.1145/3462244.3479883 ◽

2021 ◽

Author(s):

Ahmed Hussen Abdelaziz ◽

Anushree Prasanna Kumar ◽

Chloe Seivwright ◽

Gabriele Fanelli ◽

Justin Binder ◽

...

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech

Download Full-text

Merging methods of speech visualization

ZAS Papers in Linguistics ◽

10.21248/zaspil.40.2005.255 ◽

2005 ◽

Vol 40 ◽

pp. 19-32

Author(s):

Sascha Fagel

Keyword(s):

Speech Synthesis ◽

Facial Feature ◽

Visual Speech ◽

Reference Image ◽

Audiovisual Speech ◽

Dominance Model ◽

Visual Speech Synthesis ◽

Displacement Vectors

The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech. Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.

Download Full-text

Development of text-to-audiovisual speech synthesis to support interactive language learning on a mobile device

2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom) ◽

10.1109/coginfocom.2013.6719170 ◽

2013 ◽

Author(s):

Wai-Kim Leung ◽

Ka-Wa Yuen ◽

Ka-Ho Wong ◽

Helen Meng

Keyword(s):

Language Learning ◽

Mobile Device ◽

Speech Synthesis ◽

Audiovisual Speech

Download Full-text

Development of Indonesian audiovisual speech synthesis system for assistance children with delayed speech

The Journal of the Acoustical Society of America ◽

10.1121/1.5146835 ◽

2020 ◽

Vol 148 (4) ◽

pp. 2470-2470

Author(s):

Elok Anggrayni ◽

Dhany Arifianto ◽

Nyilo Purnami ◽

Joko Sarwono ◽

Sangsaka Wira

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech ◽

Synthesis System

Download Full-text

Designing and Deploying an Interaction Modality for Articulatory-Based Audiovisual Speech Synthesis

10.1007/978-3-030-87802-3_4 ◽

2021 ◽

pp. 36-49

Author(s):

Nuno Almeida ◽

Diogo Cunha ◽

Samuel Silva ◽

António Teixeira

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech

Download Full-text

An Anthropomorphic Perspective for Audiovisual Speech Synthesis

Proceedings of the 10th International Joint Conference on Biomedical Engineering Systems and Technologies ◽

10.5220/0006150201630172 ◽

2017 ◽

Author(s):

Samuel Silva ◽

António Teixeira

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech

Download Full-text

Project MEMNON: Extending Speech Production Studies to Silent Speech, Dynamic Sounds and Audiovisual Speech Synthesis

10.21437/iberspeech.2021-31 ◽

2021 ◽

Author(s):

Samuel Silva ◽

António Teixeira ◽

Nuno Almeida ◽

Diogo Cunha ◽

David Ferreira ◽

...

Keyword(s):

Speech Production ◽

Speech Synthesis ◽

Audiovisual Speech ◽

Production Studies ◽

Download Full-text

TSynC-3miti: Audiovisual Speech Synthesis Database from Found Data

2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) ◽

10.1109/o-cocosda50338.2020.9295001 ◽

2020 ◽

Author(s):

Ausdang Thangthai ◽

Sumonmas Thatphithakkul ◽

Kwanchiva Thangthai ◽

Arnon Namsanit

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech

Download Full-text

Audiovisual speech synthesis: An overview of the state-of-the-art

Speech Communication ◽

10.1016/j.specom.2014.11.001 ◽

2015 ◽

Vol 66 ◽

pp. 182-217 ◽

Author(s):

Wesley Mattheyses ◽

Werner Verhelst

Keyword(s):

Speech Synthesis ◽

State Of The Art ◽

The State ◽

Audiovisual Speech

Download Full-text

Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis

10.21437/interspeech.2019-2848 ◽

2019 ◽

Author(s):

Sara Dahmani ◽

Vincent Colotte ◽

Valérian Girard ◽

Slim Ouni

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech

Download Full-text

Learning emotions latent representation with CVAE for text-driven expressive audiovisual speech synthesis

Neural Networks ◽

10.1016/j.neunet.2021.04.021 ◽

2021 ◽

Author(s):

Sara Dahmani ◽

Vincent Colotte ◽

Valérian Girard ◽

Slim Ouni

Keyword(s):

Speech Synthesis ◽

Audiovisual Speech

Download Full-text