scholarly journals An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder

Author(s):  
Patrick Lumban Tobing ◽  
Yi-Chiao Wu ◽  
Tomoki Hayashi ◽  
Kazuhiro Kobayashi ◽  
Tomoki Toda

This paper presents an evaluation of parallel voice conversion (VC) with neural network (NN)-based statistical models for spectral mapping and waveform generation. The NN-based architectures for spectral mapping include deep NN (DNN), deep mixture density network (DMDN), and recurrent NN (RNN) models. WaveNet (WN) vocoder is employed as a high-quality NN-based waveform generation. In VC, though, owing to the oversmoothed characteristics of estimated speech parameters, quality degradation still occurs. To address this problem, we utilize post-conversion for the converted features based on direct waveform modifferential and global variance postfilter. To preserve the consistency with the post-conversion, we further propose a spectrum differential loss for the spectral modeling. The experimental results demonstrate that: (1) the RNN-based spectral modeling achieves higher accuracy with a faster convergence rate and better generalization compared to the DNN-/DMDN-based models; (2) the RNN-based spectral modeling is also capable of producing less oversmoothed spectral trajectory; (3) the use of proposed spectrum differential loss improves the performance in the same-gender conversions; and (4) the proposed post-conversion on converted features for the WN vocoder in VC yields the best performance in both naturalness and speaker similarity compared to the conventional use of WN vocoder.

2021 ◽  
Vol 13 (9) ◽  
pp. 1701
Author(s):  
Leonardo Bagaglini ◽  
Paolo Sanò ◽  
Daniele Casella ◽  
Elsa Cattani ◽  
Giulia Panegrossi

This paper describes the Passive microwave Neural network Precipitation Retrieval algorithm for climate applications (PNPR-CLIM), developed with funding from the Copernicus Climate Change Service (C3S), implemented by ECMWF on behalf of the European Union. The algorithm has been designed and developed to exploit the two cross-track scanning microwave radiometers, AMSU-B and MHS, towards the creation of a long-term (2000–2017) global precipitation climate data record (CDR) for the ECMWF Climate Data Store (CDS). The algorithm has been trained on an observational dataset built from one year of MHS and GPM-CO Dual-frequency Precipitation Radar (DPR) coincident observations. The dataset includes the Fundamental Climate Data Record (FCDR) of AMSU-B and MHS brightness temperatures, provided by the Fidelity and Uncertainty in Climate data records from Earth Observation (FIDUCEO) project, and the DPR-based surface precipitation rate estimates used as reference. The combined use of high quality, calibrated and harmonized long-term input data (provided by the FIDUCEO microwave brightness temperature Fundamental Climate Data Record) with the exploitation of the potential of neural networks (ability to learn and generalize) has made it possible to limit the use of ancillary model-derived environmental variables, thus reducing the model uncertainties’ influence on the PNPR-CLIM, which could compromise the accuracy of the estimates. The PNPR-CLIM estimated precipitation distribution is in good agreement with independent DPR-based estimates. A multiscale assessment of the algorithm’s performance is presented against high quality regional ground-based radar products and global precipitation datasets. The regional and global three-year (2015–2017) verification analysis shows that, despite the simplicity of the algorithm in terms of input variables and processing performance, the quality of PNPR-CLIM outperforms NASA GPROF in terms of rainfall detection, while in terms of rainfall quantification they are comparable. The global analysis evidences weaknesses at higher latitudes and in the winter at mid latitudes, mainly linked to the poorer quality of the precipitation retrieval in cold/dry conditions.


2018 ◽  
Vol 8 (8) ◽  
pp. 1258 ◽  
Author(s):  
Shuming Jiao ◽  
Zhi Jin ◽  
Chenliang Chang ◽  
Changyuan Zhou ◽  
Wenbin Zou ◽  
...  

It is a critical issue to reduce the enormous amount of data in the processing, storage and transmission of a hologram in digital format. In photograph compression, the JPEG standard is commonly supported by almost every system and device. It will be favorable if JPEG standard is applicable to hologram compression, with advantages of universal compatibility. However, the reconstructed image from a JPEG compressed hologram suffers from severe quality degradation since some high frequency features in the hologram will be lost during the compression process. In this work, we employ a deep convolutional neural network to reduce the artifacts in a JPEG compressed hologram. Simulation and experimental results reveal that our proposed “JPEG + deep learning” hologram compression scheme can achieve satisfactory reconstruction results for a computer-generated phase-only hologram after compression.


2021 ◽  
Vol 336 ◽  
pp. 06015
Author(s):  
Guangwei Li ◽  
Shuxue Ding ◽  
Yujie Li ◽  
Kangkang Zhang

Music is closely related to human life and is an important way for people to express their feelings in life. Deep neural networks have played a significant role in the field of music processing. There are many different neural network models to implement deep learning for audio processing. For general neural networks, there are problems such as complex operation and slow computing speed. In this paper, we introduce Long Short-Term Memory (LSTM), which is a circulating neural network, to realize end-to-end training. The network structure is simple and can generate better audio sequences after the training model. After music generation, human voice conversion is important for music understanding and inserting lyrics to pure music. We propose the audio segmentation technology for segmenting the fixed length of the human voice. Different notes are classified through piano music without considering the scale and are correlated with the different human voices we get. Finally, through the transformation, we can express the generated piano music through the output of the human voice. Experimental results demonstrate that the proposed scheme can successfully obtain a human voice from pure piano Music generated by LSTM.


Author(s):  
Wenlong Li ◽  
◽  
Kaoru Hirota ◽  
Yaping Dai ◽  
Zhiyang Jia

An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.


2010 ◽  
Vol 18 (5) ◽  
pp. 954-964 ◽  
Author(s):  
Srinivas Desai ◽  
Alan W Black ◽  
B Yegnanarayana ◽  
Kishore Prahallad

Sign in / Sign up

Export Citation Format

Share Document