Voice conversion algorithm based on piecewise linear conversion rules of formant frequency and spectrum tilt

1995 ◽  
Vol 16 (2) ◽  
pp. 153-164 ◽  
Author(s):  
Hideyuki Mizuno ◽  
Masanobu Abe
2020 ◽  
Vol 10 (8) ◽  
pp. 2884
Author(s):  
Ki-Seung Lee

In voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker’s voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converted and the target voices was adopted in the proposed VC scheme. The conversion rules for the features associated with the spectral envelope and the pitch modification factor were jointly constructed so that perceptual distance measurement was minimized. This minimization problem was solved using a deep neural network (DNN) framework where input features and target features were derived from source speech signals and time-aligned version of target speech signals, respectively. The validation tests were carried out for the CMU ARCTIC database to evaluate the effectiveness of the proposed method, especially in terms of perceptual quality. The experimental results showed that the proposed method yielded perceptually preferred results compared with independent conversion using conventional mean-square error (MSE) criterion. The maximum improvement in perceptual evaluation of speech quality (PESQ) was 0.312, compared with the conventional VC method.


2021 ◽  
Vol 11 (1) ◽  
pp. 33
Author(s):  
Yihang Chen ◽  
Zening Cao ◽  
Jinxin Wang ◽  
Yan Shi ◽  
Zilong Qin

In the process of global information construction, different fields have built their own discrete global grid systems (DGGS). With the development of big data technology, data exchange, integration, and update have gradually become a trend, as well as the associative integration of different DGGS. Due to the heterogeneity of DGGS and the different encoding rules, how to build the encoding conversion rules and data mapping relationship between the same object in various DGGS is an effective support and key technology to achieve the interoperability of DGGS. As a kind of multipurpose DGGS, the quaternary triangular mesh (QTM) has become an effective spatial framework for constructing the digital earth because of its simple structure. At present, there are many schemes for QTM encoding research, which plays a key role in the development of QTM, but at the same time, it also leads to difficulties in the communication and integration of QTM under different encoding. In order to solve this problem, we explore the characteristics of QTM encoding, and put forward three conversion algorithms: resampling conversion algorithm, hierarchical conversion algorithm, and row–column conversion algorithm.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 196578-196586
Author(s):  
Chunhui Deng ◽  
Ying Chen ◽  
Huifang Deng

2020 ◽  
Vol 29 (3) ◽  
pp. 391-403
Author(s):  
Dania Rishiq ◽  
Ashley Harkrider ◽  
Cary Springer ◽  
Mark Hedrick

Purpose The main purpose of this study was to evaluate aging effects on the predominantly subcortical (brainstem) encoding of the second-formant frequency transition, an essential acoustic cue for perceiving place of articulation. Method Synthetic consonant–vowel syllables varying in second-formant onset frequency (i.e., /ba/, /da/, and /ga/ stimuli) were used to elicit speech-evoked auditory brainstem responses (speech-ABRs) in 16 young adults ( M age = 21 years) and 11 older adults ( M age = 59 years). Repeated-measures mixed-model analyses of variance were performed on the latencies and amplitudes of the speech-ABR peaks. Fixed factors were phoneme (repeated measures on three levels: /b/ vs. /d/ vs. /g/) and age (two levels: young vs. older). Results Speech-ABR differences were observed between the two groups (young vs. older adults). Specifically, older listeners showed generalized amplitude reductions for onset and major peaks. Significant Phoneme × Group interactions were not observed. Conclusions Results showed aging effects in speech-ABR amplitudes that may reflect diminished subcortical encoding of consonants in older listeners. These aging effects were not phoneme dependent as observed using the statistical methods of this study.


1991 ◽  
Vol 34 (3) ◽  
pp. 671-678 ◽  
Author(s):  
Joan E. Sussman

This investigation examined the response strategies and discrimination accuracy of adults and children aged 5–10 as the ratio of same to different trials was varied across three conditions of a “change/no-change” discrimination task. The conditions varied as follows: (a) a ratio of one-third same to two-thirds different trials (33% same), (b) an equal ratio of same to different trials (50% same), and (c) a ratio of two-thirds same to one-third different trials (67% same). Stimuli were synthetic consonant-vowel syllables that changed along a place of articulation dimension by formant frequency transition. Results showed that all subjects changed their response strategies depending on the ratio of same-to-different trials. The most lax response pattern was observed for the 50% same condition, and the most conservative pattern was observed for the 67% same condition. Adult response patterns were most conservative across condition. Differences in discrimination accuracy as measured by P(C) were found, with the largest difference in the 5- to 6-year-old group and the smallest change in the adult group. These findings suggest that children’s response strategies, like those of adults, can be manipulated by changing the ratio of same-to-different trials. Furthermore, interpretation of sensitivity measures must be referenced to task variables such as the ratio of same-to-different trials.


Sign in / Sign up

Export Citation Format

Share Document