An evaluation of voice conversion with neural network spectral mapping models and WaveNet vocoder

This paper presents an evaluation of parallel voice conversion (VC) with neural network (NN)-based statistical models for spectral mapping and waveform generation. The NN-based architectures for spectral mapping include deep NN (DNN), deep mixture density network (DMDN), and recurrent NN (RNN) models. WaveNet (WN) vocoder is employed as a high-quality NN-based waveform generation. In VC, though, owing to the oversmoothed characteristics of estimated speech parameters, quality degradation still occurs. To address this problem, we utilize post-conversion for the converted features based on direct waveform modifferential and global variance postfilter. To preserve the consistency with the post-conversion, we further propose a spectrum differential loss for the spectral modeling. The experimental results demonstrate that: (1) the RNN-based spectral modeling achieves higher accuracy with a faster convergence rate and better generalization compared to the DNN-/DMDN-based models; (2) the RNN-based spectral modeling is also capable of producing less oversmoothed spectral trajectory; (3) the use of proposed spectrum differential loss improves the performance in the same-gender conversions; and (4) the proposed post-conversion on converted features for the WN vocoder in VC yields the best performance in both naturalness and speaker similarity compared to the conventional use of WN vocoder.

Download Full-text

High-quality voice conversion system based on GMM statistical parameters and RBF neural network

The Journal of China Universities of Posts and Telecommunications ◽

10.1016/s1005-8885(14)60333-2 ◽

2014 ◽

Vol 21 (5) ◽

pp. 68-75 ◽

Cited By ~ 4

Author(s):

Xian-tong CHEN ◽

Ling-hua ZHANG

Keyword(s):

Neural Network ◽

Rbf Neural Network ◽

Voice Conversion ◽

Statistical Parameters ◽

High Quality ◽

Conversion System

Download Full-text

Applying improved spectral modeling for High Quality voice conversion

2009 IEEE International Conference on Acoustics, Speech and Signal Processing ◽

10.1109/icassp.2009.4960576 ◽

2009 ◽

Cited By ~ 3

Author(s):

Fernando Villavicencio ◽

Axel Robel ◽

Xavier Rodet

Keyword(s):

Voice Conversion ◽

High Quality ◽

Spectral Modeling

Download Full-text

Prediction of Soybean Growth and Development Stages Using Artificial Neural Network and Statistical Models

ACTA AGRONOMICA SINICA ◽

10.3724/sp.j.1006.2009.00341 ◽

2009 ◽

Vol 35 (2) ◽

pp. 341-347 ◽

Cited By ~ 5

Author(s):

Jiu-Quan ZHANG ◽

Ling-Xiao ZHANG ◽

Ming-Hua ZHANG ◽

Clarence WATSON

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Statistical Models ◽

Growth And Development ◽

Development Stages ◽

Artificial Neural

Download Full-text

Photovoltaic generation power prediction research based on high quality context ontology and gated recurrent neural network

Sustainable Energy Technologies and Assessments ◽

10.1016/j.seta.2021.101191 ◽

2021 ◽

Vol 45 ◽

pp. 101191

Author(s):

Hongfei Liu ◽

Qian Gao ◽

Pengcheng Ma

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Power Prediction ◽

High Quality ◽

Photovoltaic Generation ◽

Generation Power

Download Full-text

The Passive Microwave Neural Network Precipitation Retrieval Algorithm for Climate Applications (PNPR-CLIM): Design and Verification

Remote Sensing ◽

10.3390/rs13091701 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1701

Author(s):

Leonardo Bagaglini ◽

Paolo Sanò ◽

Daniele Casella ◽

Elsa Cattani ◽

Giulia Panegrossi

Keyword(s):

Neural Network ◽

Passive Microwave ◽

Retrieval Algorithm ◽

Climate Data ◽

High Quality ◽

Data Record ◽

Global Precipitation ◽

Climate Data Record

This paper describes the Passive microwave Neural network Precipitation Retrieval algorithm for climate applications (PNPR-CLIM), developed with funding from the Copernicus Climate Change Service (C3S), implemented by ECMWF on behalf of the European Union. The algorithm has been designed and developed to exploit the two cross-track scanning microwave radiometers, AMSU-B and MHS, towards the creation of a long-term (2000–2017) global precipitation climate data record (CDR) for the ECMWF Climate Data Store (CDS). The algorithm has been trained on an observational dataset built from one year of MHS and GPM-CO Dual-frequency Precipitation Radar (DPR) coincident observations. The dataset includes the Fundamental Climate Data Record (FCDR) of AMSU-B and MHS brightness temperatures, provided by the Fidelity and Uncertainty in Climate data records from Earth Observation (FIDUCEO) project, and the DPR-based surface precipitation rate estimates used as reference. The combined use of high quality, calibrated and harmonized long-term input data (provided by the FIDUCEO microwave brightness temperature Fundamental Climate Data Record) with the exploitation of the potential of neural networks (ability to learn and generalize) has made it possible to limit the use of ancillary model-derived environmental variables, thus reducing the model uncertainties’ influence on the PNPR-CLIM, which could compromise the accuracy of the estimates. The PNPR-CLIM estimated precipitation distribution is in good agreement with independent DPR-based estimates. A multiscale assessment of the algorithm’s performance is presented against high quality regional ground-based radar products and global precipitation datasets. The regional and global three-year (2015–2017) verification analysis shows that, despite the simplicity of the algorithm in terms of input variables and processing performance, the quality of PNPR-CLIM outperforms NASA GPROF in terms of rainfall detection, while in terms of rainfall quantification they are comparable. The global analysis evidences weaknesses at higher latitudes and in the winter at mid latitudes, mainly linked to the poorer quality of the precipitation retrieval in cold/dry conditions.

Download Full-text

Compression of Phase-Only Holograms with JPEG Standard and Deep Learning

Applied Sciences ◽

10.3390/app8081258 ◽

2018 ◽

Vol 8 (8) ◽

pp. 1258 ◽

Cited By ~ 16

Author(s):

Shuming Jiao ◽

Zhi Jin ◽

Chenliang Chang ◽

Changyuan Zhou ◽

Wenbin Zou ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

High Frequency ◽

Critical Issue ◽

Compression Process ◽

Compression Scheme ◽

Quality Degradation ◽

Digital Format ◽

Enormous Amount ◽

Frequency Features

It is a critical issue to reduce the enormous amount of data in the processing, storage and transmission of a hologram in digital format. In photograph compression, the JPEG standard is commonly supported by almost every system and device. It will be favorable if JPEG standard is applicable to hologram compression, with advantages of universal compatibility. However, the reconstructed image from a JPEG compressed hologram suffers from severe quality degradation since some high frequency features in the hologram will be lost during the compression process. In this work, we employ a deep convolutional neural network to reduce the artifacts in a JPEG compressed hologram. Simulation and experimental results reveal that our proposed “JPEG + deep learning” hologram compression scheme can achieve satisfactory reconstruction results for a computer-generated phase-only hologram after compression.

Download Full-text

High-Quality Nonparallel Voice Conversion Based on Cycle-Consistent Adversarial Network

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462342 ◽

2018 ◽

Cited By ~ 26

Author(s):

Fuming Fang ◽

Junichi Yamagishi ◽

Isao Echizen ◽

Jaime Lorenzo-Trueba

Keyword(s):

Voice Conversion ◽

High Quality ◽

Adversarial Network

Download Full-text

Music generation and human voice conversion based on LSTM

MATEC Web of Conferences ◽

10.1051/matecconf/202133606015 ◽

2021 ◽

Vol 336 ◽

pp. 06015

Author(s):

Guangwei Li ◽

Shuxue Ding ◽

Yujie Li ◽

Kangkang Zhang

Keyword(s):

Neural Network ◽

Neural Networks ◽

Short Term Memory ◽

Human Life ◽

Piano Music ◽

Training Model ◽

Voice Conversion ◽

Neural Network Models ◽

Human Voice ◽

Music Generation

Music is closely related to human life and is an important way for people to express their feelings in life. Deep neural networks have played a significant role in the field of music processing. There are many different neural network models to implement deep learning for audio processing. For general neural networks, there are problems such as complex operation and slow computing speed. In this paper, we introduce Long Short-Term Memory (LSTM), which is a circulating neural network, to realize end-to-end training. The network structure is simple and can generate better audio sequences after the training model. After music generation, human voice conversion is important for music understanding and inserting lyrics to pure music. We propose the audio segmentation technology for segmenting the fixed length of the human voice. Different notes are classified through piano music without considering the scale and are correlated with the different human voices we get. Finally, through the transformation, we can express the generated piano music through the output of the human voice. Experimental results demonstrate that the proposed scheme can successfully obtain a human voice from pure piano Music generated by LSTM.

Download Full-text

An Improved Fully Convolutional Network Based on Post-Processing with Global Variance Equalization and Noise-Aware Training for Speech Enhancement

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2021.p0130 ◽

2021 ◽

Vol 25 (1) ◽

pp. 130-137

Author(s):

Wenlong Li ◽

◽

Kaoru Hirota ◽

Yaping Dai ◽

Zhiyang Jia

Keyword(s):

Neural Network ◽

Speech Enhancement ◽

Deep Neural Network ◽

Voice Conversion ◽

Post Processing ◽

Generalization Capability ◽

Convolutional Network ◽

Fully Convolutional Network ◽

Subjective Score ◽

Model Training

An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.

Download Full-text