Improving Deep Neural Network Based Speech Synthesis through Contextual Feature Parametrization and Multi-Task Learning

Journal of Signal Processing Systems ◽

10.1007/s11265-017-1293-z ◽

2017 ◽

Vol 90 (7) ◽

pp. 1025-1037

Author(s):

Zhengqi Wen ◽

Kehuang Li ◽

Zhen Huang ◽

Chin-Hui Lee ◽

Jianhua Tao

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Task Learning ◽

Contextual Feature

Download Full-text

Biomedical semantic indexing by deep neural network with multi-task learning

BMC Bioinformatics ◽

10.1186/s12859-018-2534-2 ◽

2018 ◽

Vol 19 (S20) ◽

Author(s):

Yongping Du ◽

Yunpeng Pan ◽

Chencheng Wang ◽

Junzhong Ji

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Semantic Indexing ◽

Download Full-text

Research on Dungan speech synthesis based on Deep Neural Network

2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) ◽

10.1109/iscslp.2018.8706713 ◽

2018 ◽

Author(s):

Lijia Chen ◽

Hongwu Yang ◽

Hui Wang

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network

Download Full-text

Continuous vocoder applied in deep neural network based voice conversion

Multimedia Tools and Applications ◽

10.1007/s11042-019-08198-5 ◽

2019 ◽

Vol 78 (23) ◽

pp. 33549-33572

Author(s):

Mohammed Salah Al-Radhi ◽

Tamás Gábor Csapó ◽

Géza Németh

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Signal To Noise Ratio ◽

Geometric Approach ◽

Objective Evaluation ◽

Voice Conversion ◽

Listening Tests ◽

Alignment Errors

Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.

Download Full-text

A pseudo-task design in multi-task learning deep neural network for speaker recognition

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) ◽

10.1109/iscslp.2016.7918433 ◽

2016 ◽

Author(s):

Xugang Lu ◽

Peng Shen ◽

Yu Tsao ◽

Hisashi Kawai

Keyword(s):

Neural Network ◽

Speaker Recognition ◽

Deep Neural Network ◽

Task Design ◽

Download Full-text

Sentence-level control vectors for deep neural network speech synthesis

10.21437/interspeech.2015-128 ◽

2015 ◽

Author(s):

Oliver Watts ◽

Zhizheng Wu ◽

Simon King

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Level Control ◽

Download Full-text

High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network

2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2016.7472653 ◽

2016 ◽

Author(s):

Lauri Juvela ◽

Bajibabu Bollepalli ◽

Manu Airaksinen ◽

Paavo Alku

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Statistical Parametric Speech Synthesis ◽

Parametric Speech Synthesis

Download Full-text

Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2015.7178925 ◽

2015 ◽

Author(s):

Ritwik Giri ◽

Michael L. Seltzer ◽

Jasha Droppo ◽

Dong Yu

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Deep Neural Network ◽

Download Full-text

On development deep neural network speech synthesis using vector quantized acoustical feature for isolated bahasa Indonesia words

2016 Conference of The Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques (O-COCOSDA) ◽

10.1109/icsda.2016.7918993 ◽

2016 ◽

Author(s):

Trikarsa Tirtadwipa Manunggal ◽

Dhany Arifianto

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Bahasa Indonesia

Download Full-text

Adaptation of an Expressive Single Speaker Deep Neural Network Speech Synthesis System

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8461888 ◽

2018 ◽

Author(s):

Jonathan Parker ◽

Yannis Stylianou ◽

Roberto Cipolla

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Synthesis System ◽

Download Full-text

A deep neural network based multi-task learning approach to hate speech detection

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.106458 ◽

2020 ◽

Vol 210 ◽

pp. 106458

Author(s):

Prashant Kapil ◽

Asif Ekbal

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Hate Speech ◽

Learning Approach ◽

Speech Detection ◽

Download Full-text