Speech enhancement algorithm of improved OMLSA based on bilateral spectrogram filtering

2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.

Author(s):  
Mourad Talbi ◽  
Med Salim Bouhlel

Background: In this paper, we propose a secure image watermarking technique which is applied to grayscale and color images. It consists in applying the SVD (Singular Value Decomposition) in the Lifting Wavelet Transform domain for embedding a speech image (the watermark) into the host image. Methods: It also uses signature in the embedding and extraction steps. Its performance is justified by the computation of PSNR (Pick Signal to Noise Ratio), SSIM (Structural Similarity), SNR (Signal to Noise Ratio), SegSNR (Segmental SNR) and PESQ (Perceptual Evaluation Speech Quality). Results: The PSNR and SSIM are used for evaluating the perceptual quality of the watermarked image compared to the original image. The SNR, SegSNR and PESQ are used for evaluating the perceptual quality of the reconstructed or extracted speech signal compared to the original speech signal. Conclusion: The Results obtained from computation of PSNR, SSIM, SNR, SegSNR and PESQ show the performance of the proposed technique.


2018 ◽  
Vol 143 (3) ◽  
pp. 1751-1751 ◽  
Author(s):  
Frederic Apoux ◽  
Brittney Carter ◽  
Karl P. Velik ◽  
Eric Healy

Geophysics ◽  
2013 ◽  
Vol 78 (6) ◽  
pp. V229-V237 ◽  
Author(s):  
Hongbo Lin ◽  
Yue Li ◽  
Baojun Yang ◽  
Haitao Ma

Time-frequency peak filtering (TFPF) may efficiently suppress random noise and hence improve the signal-to-noise ratio. However, the errors are not always satisfactory when applying the TFPF to fast-varying seismic signals. We begin with an error analysis for the TFPF by using the spread factor of the phase and cumulants of noise. This analysis shows that the nonlinear signal component and non-Gaussian random noise lead to the deviation of the pseudo-Wigner-Ville distribution (PWVD) peaks from the instantaneous frequency. The deviation introduces the signal distortion and random oscillations in the result of the TFPF. We propose a weighted reassigned smoothed PWVD with less deviation than PWVD. The proposed method adopts a frequency window to smooth away the residual oscillations in the PWVD, and incorporates a weight function in the reassignment which sharpens the time-frequency distribution for reducing the deviation. Because the weight function is determined by the lateral coherence of seismic data, the smoothed PWVD is assigned to the accurate instantaneous frequency for desired signal components by weighted frequency reassignment. As a result, the TFPF based on the weighted reassigned PWVD (TFPF_WR) can be more effective in suppressing random noise and preserving signal as compared with the TFPF using the PWVD. We test the proposed method on synthetic and field seismic data, and compare it with a wavelet-transform method and [Formula: see text] prediction filter. The results show that the proposed method provides better performance over the other methods in signal preserving under low signal-to-noise ratio.


2012 ◽  
Vol 226-228 ◽  
pp. 237-240 ◽  
Author(s):  
Mei Jun Zhang ◽  
Hao Chen ◽  
Chuang Wang ◽  
Qing Cao

In order to extract effectively detection signals in the noise background for non-stationary signal.On the basis of EEMD, improved EEMD is put forward, the improve EEMD threshold noise reduction is researched in this paper.The simulation signal compared the noise reduction effect of the wavelet,EMD,EEMD,and the improved EEMD. The improved EEMD threshold noise reduction have the best noise reduction result , the highest signal-to-noise ratio, the smallest standard deviation error.After the improved EEMD threshold noise reduction , the measurement signal time domain waveform smooth. More high frequency noise was obviously reduced in Hilbert time- frequency spectrum. Signal-to-noise ratio significantly improve, and signal characteristics are very clear.


2015 ◽  
Author(s):  
Jinjiang Wang ◽  
Robert X. Gao ◽  
Xinyao Tang ◽  
Zhaoyan Fan ◽  
Peng Wang

Data communication through metallic structures is generally encountered in manufacturing equipment and process monitoring and control. This paper presents a signal processing technique for enhancing the signal-to-noise ratio and high-bit data transmission rate in ultrasound-based wireless data transmission through metallic structures. A multi-carrier coded-ultrasonic wave modulation scheme is firstly investigated to achieve high-bit data rate communication while reducing inter-symbol inference and data loss, due to the inherent signal attenuation, wave diffraction and reflection in metallic structures. To improve the signal-to-noise ratio, dual-tree wavelet packet transform (DT-WPT) has been investigated to separate multi-carrier signals under noise contamination, given its properties of shift-invariance and flexible time frequency partitioning. A new envelope extraction and threshold setting strategy for selected wavelet coefficients is then introduced to retrieve the coded digital information. Experimental studies are performed to evaluate the effectiveness of the developed signal processing method for manufacturing.


2019 ◽  
Vol 24 (4) ◽  
pp. 728-735
Author(s):  
Mourad Talbi ◽  
Med Salim Bouhlel

In this paper, a new speech compression technique is proposed. This technique applies a Psychoacoustic Model and a general approach for Filter Bank Design using optimization. It is evaluated and compared with a compression technique using a MDCT (Modified Discrete Cosine Transform) Filter Bank of 32 Filters and a Psychoacoustic Model. This evaluation and comparison is performed by calculating bits before and after compression, PSNR (Peak Signal to Noise Ratio), NRMSE (Normalized Root Mean Square Error), SNR (Signal to Noise Ratio) and PESQ (Perceptual evaluation of speech quality) computations. The two techniques are tested and applied to a number of speech signals that are sampled at 8 kHz. The results obtained from this evaluation show that the proposed technique outperforms the second compression technique (based on a Psychoacoustic Model and MDCT filter Bank) in terms of Bits after compression and compression ratio. In fact, the proposed technique yields higher values for the compression ratio than the second compression technique. Moreover, the proposed compression technique presents reconstructed speech signals with acceptable perceptual qualities. This is justified by the values of SNR, PSNR and NRMSE and PESQ.


Sign in / Sign up

Export Citation Format

Share Document