A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion

Music is closely related to human life and is an important way for people to express their feelings in life. Deep neural networks have played a significant role in the field of music processing. There are many different neural network models to implement deep learning for audio processing. For general neural networks, there are problems such as complex operation and slow computing speed. In this paper, we introduce Long Short-Term Memory (LSTM), which is a circulating neural network, to realize end-to-end training. The network structure is simple and can generate better audio sequences after the training model. After music generation, human voice conversion is important for music understanding and inserting lyrics to pure music. We propose the audio segmentation technology for segmenting the fixed length of the human voice. Different notes are classified through piano music without considering the scale and are correlated with the different human voices we get. Finally, through the transformation, we can express the generated piano music through the output of the human voice. Experimental results demonstrate that the proposed scheme can successfully obtain a human voice from pure piano Music generated by LSTM.

Download Full-text

An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0130 ◽

2021 ◽

Vol 25 (1) ◽

pp. 130-137

Author(s):

Wenlong Li ◽

◽

Kaoru Hirota ◽

Yaping Dai ◽

Zhiyang Jia

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network ◽

Voice Conversion ◽

Post Processing ◽

Generalization Capability ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Subjective Score ◽

Model Training

An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.

Download Full-text

Continuous vocoder applied in deep neural network based voice conversion

Multimedia Tools and Applications ◽

10.1007/s11042-019-08198-5 ◽

2019 ◽

Vol 78 (23) ◽

pp. 33549-33572

Author(s):

Mohammed Salah Al-Radhi ◽

Tamás Gábor Csapó ◽

Géza Németh

Keyword(s):

Neural Network ◽

Speech Synthesis ◽

Deep Neural Network ◽

Signal To Noise Ratio ◽

Geometric Approach ◽

Objective Evaluation ◽

Voice Conversion ◽

Listening Tests ◽

Alignment Errors

Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.

Download Full-text

Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) ◽

10.1109/apsipaasc47483.2019.9023277 ◽

2019 ◽

Author(s):

Yi Zhou ◽

Xiaohai Tian ◽

Rohan Kumar Das ◽

Haizhou Li

Keyword(s):

Voice Conversion ◽

Cross Lingual

Download Full-text

Neural-Network Lexical Translation for Cross-lingual IR from Text and Speech

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR'19 ◽

10.1145/3331184.3331222 ◽

2019 ◽

Cited By ~ 3

Author(s):

Rabih Zbib ◽

Lingjun Zhao ◽

Damianos Karakos ◽

William Hartmann ◽

Jay DeYoung ◽

...

Keyword(s):

Neural Network ◽

Cross Lingual

Download Full-text

BP Neural Network Learning Algorithm and its Software Implementation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.513-517.738 ◽

2014 ◽

Vol 513-517 ◽

pp. 738-741 ◽

Cited By ~ 2

Author(s):

Ying Jian Lin ◽

Xiao Ji Chen

Keyword(s):

Neural Network ◽

Bp Neural Network ◽

Learning Algorithm ◽

Bp Algorithm ◽

Software Implementation ◽

Voice Conversion ◽

Bp Network ◽

Widespread Application ◽

Neural Network Learning ◽

Basic Work

BP neural network in character recognition, pattern classification, text and voice conversion, image compression, decision support and so on aspects has the widespread application, in view of the problems existing in the actual application, this paper researches learning algorithm and software implementation. Learning algorithm studies include three aspects, illustrates the basic thoughts of the BP algorithm, designed the three layers BP network structure, the mathematical model for the accurate description of algorithm. Software implementation studies include two aspects, the network model of all neurons become linked list structure and storage structure is designed, the design of the software process and will implement the process into four steps. BP algorithm of the software implementation is a basic work for the application of BP neural network, using the research results of this paper, the user can easily neural network design and simulation.

Download Full-text

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

10.21437/interspeech.2018-2525 ◽

2018 ◽

Cited By ~ 1

Author(s):

Seyed Hamidreza Mohammadi ◽

Taehwan Kim

Keyword(s):

Voice Conversion ◽

Cross Lingual

Download Full-text

A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion

Mandarin-Tibetan Cross-Lingual Voice Conversion System Based on Deep Neural Network

A New HMM-Based Voice Conversion Methodology Evaluated on Monolingual and Cross-Lingual Conversion Tasks

Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages

Music generation and human voice conversion based on LSTM

An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement

Continuous vocoder applied in deep neural network based voice conversion

Many-to-many Cross-lingual Voice Conversion with a Jointly Trained Speaker Embedding Network

Neural-Network Lexical Translation for Cross-lingual IR from Text and Speech

BP Neural Network Learning Algorithm and its Software Implementation

Investigation of Using Disentangled and Interpretable Representations for One-shot Cross-lingual Voice Conversion

Export Citation Format