voice conversion Latest Research Papers

Voice assistant systems (e.g., Siri, Alexa) have attracted wide research attention. However, such systems could receive voice information from malicious sources. Recent work has demonstrated that the voice authentication system is vulnerable to different types of attacks. The attacks are categorized into two main types: spoofing attacks and hidden voice commands. In this chapter, how to launch and defend such attacks is explored. For the spoofing attack, there are four main types, such as replay attacks, impersonation attacks, speech synthesis attacks, and voice conversion attacks. Although such attacks could be accurate on the speech recognition system, they could be easily identified by humans. Thus, the hidden voice commands have attracted a lot of research interest in recent years.

Download Full-text

Age-Based Automatic Voice Conversion Using Blood Relation for Voice Impaired

Computers Materials & Continua ◽

10.32604/cmc.2022.020065 ◽

2022 ◽

Vol 70 (2) ◽

pp. 4027-4051

Author(s):

Palli Padmini ◽

C. Paramasivam ◽

G. Jyothish Lal ◽

Sadeen Alharbi ◽

Kaustav Bhowmick

Keyword(s):

Voice Conversion

Download Full-text

Noise-robust voice conversion with domain adversarial training

Neural Networks ◽

10.1016/j.neunet.2022.01.003 ◽

2022 ◽

Author(s):

Hongqiang Du ◽

Lei Xie ◽

Haizhou Li

Keyword(s):

Voice Conversion ◽

Adversarial Training ◽

Noise Robust

Download Full-text

A New Speech Enhancement Technique Based on Stationary Bionic Wavelet Transform and MMSE Estimate of Spectral Amplitude

Security and Communication Networks ◽

10.1155/2021/9968275 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Mourad Talbi ◽

Med Salim Bouhlel

Keyword(s):

Wavelet Transform ◽

Speech Enhancement ◽

Hearing Aids ◽

Speech Signal ◽

Minimum Mean Square Error ◽

Voice Conversion ◽

Spectral Amplitude ◽

Wavelet Coefficients ◽

Enhancement Technique ◽

Perceptual Evaluation

Speech enhancement has gained considerable attention in the employment of speech transmission via the communication channel, speaker identification, speech-based biometric systems, video conference, hearing aids, mobile phones, voice conversion, microphones, and so on. The background noise processing is needed for designing a successful speech enhancement system. In this work, a new speech enhancement technique based on Stationary Bionic Wavelet Transform (SBWT) and Minimum Mean Square Error (MMSE) Estimate of Spectral Amplitude is proposed. This technique consists at the first step in applying the SBWT to the noisy speech signal, in order to obtain eight noisy wavelet coefficients. The denoising of each of those coefficients is performed through the application of the denoising method based on MMSE Estimate of Spectral Amplitude. The SBWT inverse, S B W T − 1 , is applied to the obtained denoised stationary wavelet coefficients for finally obtaining the enhanced speech signal. The proposed technique’s performance is proved by the calculation of the Signal to Noise Ratio (SNR), the Segmental SNR (SSNR), and the Perceptual Evaluation of Speech Quality (PESQ).

Download Full-text

Emotional voice conversion: Theory, databases and ESD

Speech Communication ◽

10.1016/j.specom.2021.11.006 ◽

2021 ◽

Author(s):

Kun Zhou ◽

Berrak Sisman ◽

Rui Liu ◽

Haizhou Li

Keyword(s):

Voice Conversion

Download Full-text

Generation and Evaluation of Self-hearing Voice with Voice Conversion Technique

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.141.1267 ◽

2021 ◽

Vol 141 (12) ◽

pp. 1267-1268

Author(s):

Hiroki Nonoyama ◽

Chifumi Suzuki ◽

Takanori Nishino

Keyword(s):

Voice Conversion ◽

Hearing Voice

Download Full-text

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2021.106602 ◽

2021 ◽

pp. 106602

Author(s):

Wei-Zhong Zheng ◽

Ji-Yan Han ◽

Chen-Kai Lee ◽

Yu-Yi Lin ◽

Shu-Han Chang ◽

...

Keyword(s):

Speech Intelligibility ◽

Voice Conversion ◽

Conversion System

Download Full-text

U2-VC: one-shot voice conversion using two-level nested U-structure

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-021-00226-3 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Fangkun Liu ◽

Hui Wang ◽

Renhua Peng ◽

Chengshi Zheng ◽

Xiaodong Li

Keyword(s):

Identity Transformation ◽

Voice Conversion ◽

Time Frequency ◽

Multi Scale ◽

Wide Range ◽

Subjective Measurements ◽

The Voice ◽

Content Information ◽

Target Speaker ◽

Made In

AbstractVoice conversion is to transform a source speaker to the target one, while keeping the linguistic content unchanged. Recently, one-shot voice conversion gradually becomes a hot topic for its potentially wide range of applications, where it has the capability to convert the voice from any source speaker to any other target speaker even when both the source speaker and the target speaker are unseen during training. Although a great progress has been made in one-shot voice conversion, the naturalness of the converted speech remains a challenging problem. To further improve the naturalness of the converted speech, this paper proposes a two-level nested U-structure (U2-Net) voice conversion algorithm called U2-VC. The U2-Net can extract both local feature and multi-scale feature of log-mel spectrogram, which can help to learn the time-frequency structures of the source speech and the target speech. Moreover, we adopt sandwich adaptive instance normalization (SaAdaIN) in decoder for speaker identity transformation to retain more content information of the source speech while maintaining the speaker similarity between the converted speech and the target speech. Experiments on VCTK dataset show that U2-VC outperforms many SOTA approaches including AGAIN-VC and AdaIN-VC in terms of both objective and subjective measurements.

Download Full-text

Static–dynamic features and hybrid deep learning models based spoof detection system for ASV

Complex & Intelligent Systems ◽

10.1007/s40747-021-00565-w ◽

2021 ◽

Author(s):

Aakshi Mittal ◽

Mohit Dua

Keyword(s):

Deep Learning ◽

Speech Synthesis ◽

Short Term Memory ◽

Detection System ◽

Performance Comparison ◽

Voice Conversion ◽

Learning Models ◽

Dynamic Features ◽

User Identification ◽

Detection Systems

AbstractDetection of spoof is essential for improving the performance of current scenario of Automatic Speaker Verification (ASV) systems. Empowerment to both frontend and backend parts can build the robust ASV systems. First, this paper discuses performance comparison of static and static–dynamic Constant Q Cepstral Coefficients (CQCC) frontend features by using Long Short Term Memory (LSTM) with Time Distributed Wrappers model at the backend. Second, it performs comparative analysis of ASV systems built using three deep learning models LSTM with Time Distributed Wrappers, LSTM and Convolutional Neural Network at backend and using static–dynamic CQCC features at frontend. Third, it discusses implementation of two spoof detection systems for ASV by using same static–dynamic CQCC features at frontend and different combination of deep learning models at backend. Out of these two, the first one is a voting protocol based two-level spoof detection system that uses CNN, LSTM model at first level and LSTM with Time Distributed Wrappers model at second level. The second one is a two-level spoof detection system with user identification and verification protocol, which uses LSTM model for user identification at first level and LSTM with Time Distributed Wrappers for verification at the second level. For implementing the proposed work, a variation in ASVspoof 2019 dataset has been used to introduce all types of spoofing attacks such as Speech Synthesis (SS), Voice Conversion (VC) and replay in single set of dataset. The results show that, at frontend, static–dynamic CQCC feature outperform static CQCC features and at the backend, hybrid combination of deep learning models increases accuracy of spoof detection systems.

Download Full-text

Data augmentation based non-parallel voice conversion with frame-level speaker disentangler

Speech Communication ◽

10.1016/j.specom.2021.10.001 ◽

2021 ◽

Author(s):

Bo Chen ◽

Zhihang Xu ◽

Kai Yu

Keyword(s):

Data Augmentation ◽

Voice Conversion

Download Full-text

voice conversion
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Attacks on Voice Assistant Systems

Age-Based Automatic Voice Conversion Using Blood Relation for Voice Impaired

Noise-robust voice conversion with domain adversarial training

A New Speech Enhancement Technique Based on Stationary Bionic Wavelet Transform and MMSE Estimate of Spectral Amplitude

Emotional voice conversion: Theory, databases and ESD

Generation and Evaluation of Self-hearing Voice with Voice Conversion Technique

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients

U2-VC: one-shot voice conversion using two-level nested U-structure

Static–dynamic features and hybrid deep learning models based spoof detection system for ASV

Data augmentation based non-parallel voice conversion with frame-level speaker disentangler

Export Citation Format

voice conversionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Attacks on Voice Assistant Systems

Age-Based Automatic Voice Conversion Using Blood Relation for Voice Impaired

Noise-robust voice conversion with domain adversarial training

A New Speech Enhancement Technique Based on Stationary Bionic Wavelet Transform and MMSE Estimate of Spectral Amplitude

Emotional voice conversion: Theory, databases and ESD

Generation and Evaluation of Self-hearing Voice with Voice Conversion Technique

Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients

U2-VC: one-shot voice conversion using two-level nested U-structure

Static–dynamic features and hybrid deep learning models based spoof detection system for ASV

Data augmentation based non-parallel voice conversion with frame-level speaker disentangler

voice conversion
Recently Published Documents