HMM-based speech enhancement using pitch period information in voiced speech segments

Author(s):  
S. Oberle ◽  
A. Kaelin
2015 ◽  
Vol 39 (2) ◽  
pp. 231-242 ◽  
Author(s):  
Marek Blok ◽  
Piotr Drózda

Abstract In this paper a sample rate conversion algorithm which allows for continuously changing resampling ratio has been presented. The proposed implementation is based on a variable fractional delay filter which is implemented by means of a Farrow structure. Coefficients of this structure are computed on the basis of fractional delay filters which are designed using the offset window method. The proposed approach allows us to freely change the instantaneous resampling ratio during processing. Using such an algorithm we can simulate recording of audio on magnetic tape with nonuniform velocity as well as remove such distortions. We have demonstrated capabilities of the proposed approach based on the example of speech signal processing with a resampling ratio which was computed on the basis of estimated fundamental frequency of voiced speech segments.


2007 ◽  
Vol 2007 ◽  
pp. 1-5 ◽  
Author(s):  
Aïcha Bouzid ◽  
Noureddine Ellouze

This paper describes a multiscale product method (MPM) for open quotient measure in voiced speech. The method is based on determining the glottal closing and opening instants. The proposed approach consists of making the products of wavelet transform of speech signal at different scales in order to enhance the edge detection and parameter estimation. We show that the proposed method is effective and robust for detecting speech singularity. Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important in a wide range of speech processing tasks. In this paper, accurate estimation of GCIs and GOIs is used to measure the local open quotient (Oq) which is the ratio of the open time by the pitch period. Multiscale product operates automatically on speech signal; the reference electroglottogram (EGG) signal is used for performance evaluation. The ratio of good GCI detection is 95.5% and that of GOI is 76%. The pitch period relative error is 2.6% and the open phase relative error is 5.6%. The relative error measured on open quotient reaches 3% for the whole Keele database.


1980 ◽  
Vol 16 (12) ◽  
pp. 464 ◽  
Author(s):  
E. Ambikairajah ◽  
M.J. Carey ◽  
G. Tattersall
Keyword(s):  

2012 ◽  
Vol 457-458 ◽  
pp. 1490-1493
Author(s):  
Wen Long Cai ◽  
Guang Ma

In this document, the pitch period is detected according to the sensitivity to weak sinusoidal signal and strong immunity ability to noise of Duffing oscillator, at the background of strong noise. And then enhancing pitch obtained by harmonic method. The test results show that the enhancing effect of this method is obvious under low SNR condition, and speech distortion is small.


2010 ◽  
Vol 02 (01) ◽  
pp. 65-80 ◽  
Author(s):  
KAIS KHALDI ◽  
MONIA TURKI-HADJ ALOUANE ◽  
ABDEL-OUAHAB BOUDRAA

In this paper a new method for voiced speech enhancement combining the Empirical Mode Decomposition (EMD) and the Adaptive Center Weighted Average (ACWA) filter is introduced. Noisy signal is decomposed adaptively into intrinsic oscillatory components called Intrinsic Mode Functions (IMFs). Since voiced speech structure is mostly distributed on both medium and low frequencies, the shorter scale IMFs of the noisy signal are beneath noise, however the longer scale ones are less noisy. Therefore, the main idea of the proposed approach is to only filter the shorter scale IMFs, and to keep the longer scale ones unchanged. In fact, the filtering of longer scale IMFs will introduce distortion rather than reducing noise. The denoising method is applied to several voiced speech signals with different noise levels and the results are compared with wavelet approach, ACWA filter and EMD–ACWA (filtering of all IMFs using ACWA filter). Relying on exhaustive simulations, we show the efficiency of the proposed method for reducing noise and its superiority over other denoising methods, i.e. to improve Signal-to-Noise Ratio (SNR), and to offer better listening quality based on a Perceptual Evaluation of Speech Quality (PESQ). The present study is limited to signals corrupted by additive white Gaussian noise.


2006 ◽  
Vol 65 (7) ◽  
pp. 655-665
Author(s):  
A. Mantilla-Caeiros ◽  
Hector Manuel Perez-Meana ◽  
D. Mata-Verde ◽  
C. Angeles-Pina ◽  
J. Alvarado-Soriano ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document