scholarly journals A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion

Author(s):  
Yi Zhou ◽  
Xiaohai Tian ◽  
Emre Yilmaz ◽  
Rohan Kumar Das ◽  
Haizhou Li
2014 ◽  
Author(s):  
B. Ramani ◽  
M. P. Actlin Jeeva ◽  
P. Vijayalakshmi ◽  
T. Nagarajan

2021 ◽  
Vol 336 ◽  
pp. 06015
Author(s):  
Guangwei Li ◽  
Shuxue Ding ◽  
Yujie Li ◽  
Kangkang Zhang

Music is closely related to human life and is an important way for people to express their feelings in life. Deep neural networks have played a significant role in the field of music processing. There are many different neural network models to implement deep learning for audio processing. For general neural networks, there are problems such as complex operation and slow computing speed. In this paper, we introduce Long Short-Term Memory (LSTM), which is a circulating neural network, to realize end-to-end training. The network structure is simple and can generate better audio sequences after the training model. After music generation, human voice conversion is important for music understanding and inserting lyrics to pure music. We propose the audio segmentation technology for segmenting the fixed length of the human voice. Different notes are classified through piano music without considering the scale and are correlated with the different human voices we get. Finally, through the transformation, we can express the generated piano music through the output of the human voice. Experimental results demonstrate that the proposed scheme can successfully obtain a human voice from pure piano Music generated by LSTM.


Author(s):  
Wenlong Li ◽  
◽  
Kaoru Hirota ◽  
Yaping Dai ◽  
Zhiyang Jia

An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.


2019 ◽  
Vol 78 (23) ◽  
pp. 33549-33572
Author(s):  
Mohammed Salah Al-Radhi ◽  
Tamás Gábor Csapó ◽  
Géza Németh

Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.


2014 ◽  
Vol 513-517 ◽  
pp. 738-741 ◽  
Author(s):  
Ying Jian Lin ◽  
Xiao Ji Chen

BP neural network in character recognition, pattern classification, text and voice conversion, image compression, decision support and so on aspects has the widespread application, in view of the problems existing in the actual application, this paper researches learning algorithm and software implementation. Learning algorithm studies include three aspects, illustrates the basic thoughts of the BP algorithm, designed the three layers BP network structure, the mathematical model for the accurate description of algorithm. Software implementation studies include two aspects, the network model of all neurons become linked list structure and storage structure is designed, the design of the software process and will implement the process into four steps. BP algorithm of the software implementation is a basic work for the application of BP neural network, using the research results of this paper, the user can easily neural network design and simulation.


Sign in / Sign up

Export Citation Format

Share Document