Research on Pitch Extraction Algorithm of Noisy Speech

2012 ◽  
Vol 433-440 ◽  
pp. 4675-4678
Author(s):  
Hong Yan Xing ◽  
Cui Hua Yu ◽  
Peng Li

Pitch detection in noisy environment plays an important role in speech analyzing and recognition. In the light of the properties of Hilbert-Huang transform and the EMD soft-threshold de-noising method, an effective pitch detection method for noisy speech signal is proposed in this paper. Firstly, the EMD soft-threshold de-noising method is applied to realize the background noise reduction, secondly, using the Hilbert-Huang transform to detect the pitch period of the de-noising speech signal. The analysis proposed in this paper show that, compared with the conventional methods of the pitch detection of the noisy speech, especially for the low signal to noise ratio (SNR), this approach has a higher accuracy.

2013 ◽  
Vol 846-847 ◽  
pp. 1111-1114
Author(s):  
Yao Qi Wang ◽  
Xiao Peng Wang ◽  
Tao Lei

A new method of pitch detection is proposed based on Hilbert-Huang Transform (HHT). Firstly noisy speech signal is filtered by morphological filtering to remove the noise, and then HHT is employed to get Hilbert-Huang spectrum and to calculate instantaneous energy and its derivative. Distinguish the unvoiced and voiced using mutation of instantaneous energy and track pitch.


2014 ◽  
Vol 596 ◽  
pp. 433-436 ◽  
Author(s):  
Yao Qi Wang ◽  
Xiao Peng Wang ◽  
Lv Cheng Wang

A new method of pitch detection based on morphological filtering is proposed. Noisy speech signal is filtered by morphological filtering to remove the noise and highlight pitch, and then HHT is employed to get Hilbert-Huang spectrum and to calculate instantaneous energy and its derivative. The moment of glottal opening and closing can be accurately located through tracking mutation of instantaneous energy, so that variation of pitch period can be accurately tracked. Compared with other traditional method of pitch detection, this method not only truly describes non-stationary and non-linear characteristics of speech signal, but also it is an adaptive process for the analysis of the speech signal. The experiments showed that the method has strong anti-noise and can accurately detect the pitch of speech in low SNR.


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254119
Author(s):  
Jordan A. Drew ◽  
W. Owen Brimijoin

Those experiencing hearing loss face severe challenges in perceiving speech in noisy situations such as a busy restaurant or cafe. There are many factors contributing to this deficit including decreased audibility, reduced frequency resolution, and decline in temporal synchrony across the auditory system. Some hearing assistive devices implement beamforming in which multiple microphones are used in combination to attenuate surrounding noise while the target speaker is left unattenuated. In increasingly challenging auditory environments, more complex beamforming algorithms are required, which increases the processing time needed to provide a useful signal-to-noise ratio of the target speech. This study investigated whether the benefits from signal enhancement from beamforming are outweighed by the negative impacts on perception from an increase in latency between the direct acoustic signal and the digitally enhanced signal. The hypothesis for this study is that an increase in latency between the two identical speech signals would decrease intelligibility of the speech signal. Using 3 gain / latency pairs from a beamforming simulation previously completed in lab, perceptual thresholds of SNR from a simulated use case were obtained from normal hearing participants. No significant differences were detected between the 3 conditions. When presented with 2 copies of the same speech signal presented at varying gain / latency pairs in a noisy environment, any negative intelligibility effects from latency are masked by the noise. These results allow for more lenient restrictions for limiting processing delays in hearing assistive devices.


1980 ◽  
Vol 68 (S1) ◽  
pp. S71-S71
Author(s):  
M. M. Sondhi ◽  
C. E. Schmidt ◽  
L. R. Rabiner

2013 ◽  
Vol 303-306 ◽  
pp. 1035-1038
Author(s):  
Jing Fang Wang

A new pitch detection method is designed by the recurrence analysis in this paper, which is combined of Empirical Mode Decomposition (EMD) and Elliptic Filter (EF). The Empirical Mode Decomposition (EMD) of Hilbert-Huang Transform (HHT) are utilized tosolve the problem, and a noisy voice is first filtered on the elliptic band filter. The two Intrinsic Mode Functions (IMF) are synthesized by EMD with maximum correlation of voice, and then the pitch be easily divided. The results show that the new method performance is better than the conventional autocorrelation algorithm and cepstrum method, especially in the part that the surd and the sonant are not evident, and get a high robustness in noisy environment.


Author(s):  
Amart Sulong ◽  
Teddy Surya Gunawan ◽  
Mira Kartiwi

<p><em>In communication medium to satisfy the speech enhancement process by using differents methodologies and algoirthms are the key term in testing the system design well enough to produce the best performance results for the speech system. The Wiener filter is one of the classical algorithm that applied to speech process to avoid the noise attacking the speech signal. In other word, compressive sensing method by randomize measurement matrix are combined with the Wiener filter to analyse the noisy speech signal with less introduce to noise signal and producing high signal to noise ratio. The PESQ is used to measure the quality of the proposed algorithm design. As in the experimental results shows that, attacking of defferent noise environments in speech signal still effectively improve the performance of noisy speech with maintain the high score of the PESQ quality. </em><em></em></p>


Author(s):  
Mourad Talbi ◽  
Med Salim Bouhlel

Background: In this paper, we propose a secure image watermarking technique which is applied to grayscale and color images. It consists in applying the SVD (Singular Value Decomposition) in the Lifting Wavelet Transform domain for embedding a speech image (the watermark) into the host image. Methods: It also uses signature in the embedding and extraction steps. Its performance is justified by the computation of PSNR (Pick Signal to Noise Ratio), SSIM (Structural Similarity), SNR (Signal to Noise Ratio), SegSNR (Segmental SNR) and PESQ (Perceptual Evaluation Speech Quality). Results: The PSNR and SSIM are used for evaluating the perceptual quality of the watermarked image compared to the original image. The SNR, SegSNR and PESQ are used for evaluating the perceptual quality of the reconstructed or extracted speech signal compared to the original speech signal. Conclusion: The Results obtained from computation of PSNR, SSIM, SNR, SegSNR and PESQ show the performance of the proposed technique.


1997 ◽  
Vol 101 (3-4) ◽  
pp. 177-185 ◽  
Author(s):  
Eiji Uchino ◽  
Shin Nakamura ◽  
Takeshi Yamakawa

Sign in / Sign up

Export Citation Format

Share Document