scholarly journals Continuous vocoder applied in deep neural network based voice conversion

2019 ◽  
Vol 78 (23) ◽  
pp. 33549-33572
Author(s):  
Mohammed Salah Al-Radhi ◽  
Tamás Gábor Csapó ◽  
Géza Németh

Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.

Author(s):  
Wenlong Li ◽  
◽  
Kaoru Hirota ◽  
Yaping Dai ◽  
Zhiyang Jia

An improved fully convolutional network based on post-processing with global variance (GV) equalization and noise-aware training (PN-FCN) for speech enhancement model is proposed. It aims at reducing the complexity of the speech improvement system, and it solves overly smooth speech signal spectrogram problem and poor generalization capability. The PN-FCN is fed with the noisy speech samples augmented with an estimate of the noise. In this way, the PN-FCN uses additional online noise information to better predict the clean speech. Besides, PN-FCN uses the global variance information, which improve the subjective score in a voice conversion task. Finally, the proposed framework adopts FCN, and the number of parameters is one-seventh of deep neural network (DNN). Results of experiments on the Valentini-Botinhaos dataset demonstrate that the proposed framework achieves improvements in both denoising effect and model training speed.


2020 ◽  
Vol 17 (12) ◽  
pp. 5205-5209
Author(s):  
Ali Elbialy ◽  
M. A. El-Dosuky ◽  
Ibrahim M. El-Henawy

Third generation sequencing (TGS) relates to long reads but with relatively high error rates. Quality of TGS is a hot topic, dealing with errors. This paper combines and investigates three quality related metrics. They are basecalling accuracy, Phred Quality Scores, and GC content. For basecalling accuracy, a deep neural network is adopted. The measured loss does not exceed 5.42.


2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Lei He ◽  
Yan Xing ◽  
Kangxiong Xia ◽  
Jieqing Tan

In view of the drawback of most image inpainting algorithms by which texture was not prominent, an adaptive inpainting algorithm based on continued fractions was proposed in this paper. In order to restore every damaged point, the information of known pixel points around the damaged point was used to interpolate the intensity of the damaged point. The proposed method included two steps; firstly, Thiele’s rational interpolation combined with the mask image was used to interpolate adaptively the intensities of damaged points to get an initial repaired image, and then Newton-Thiele’s rational interpolation was used to refine the initial repaired image to get a final result. In order to show the superiority of the proposed algorithm, plenty of experiments were tested on damaged images. Subjective evaluation and objective evaluation were used to evaluate the quality of repaired images, and the objective evaluation was comparison of Peak Signal to Noise Ratios (PSNRs). The experimental results showed that the proposed algorithm had better visual effect and higher Peak Signal to Noise Ratio compared with the state-of-the-art methods.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jun He ◽  
Jing Wen

To improve the nursing effect in patients after thoracic surgery, this paper proposes a refined intervention method in the operating room based on traditional operating room nursing and applies this method to the nursing of patients after thoracic surgery. Moreover, this paper improves the traditional neural network algorithm and uses the deep neural network algorithm to process test data. In addition, it includes patients accepted by the hospital as samples for test analysis and formulates detailed intervention methods for the operating room. Finally, this paper collects the corresponding test data by setting up test and control groups and visually displays the data using mathematical statistics. The statistical parameters of the experiment in this paper include the quality of recovery, complications, satisfaction score, and recovery effect. The comparative test shows that the refined intervention in the operating room based on the neural network proposed in this paper can achieve a certain effect in the postoperative nursing of thoracic surgery, effectively promote the quality of recovery, and reduce the possibility of complications.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Wanli Liu

AbstractRecently, deep neural network (DNN) studies on direction-of-arrival (DOA) estimations have attracted more and more attention. This new method gives an alternative way to deal with DOA problem and has successfully shown its potential application. However, these works are often restricted to previously known signal number, same signal-to-noise ratio (SNR) or large intersignal angular distance, which will hinder their generalization in real application. In this paper, we present a novel DNN framework that realizes higher resolution and better generalization to random signal number and SNR. Simulation results outperform that of previous works and reach the state of the art.


Sign in / Sign up

Export Citation Format

Share Document