scholarly journals Single-channel speech enhancement based on joint constrained dictionary learning

Author(s):  
Linhui Sun ◽  
Yunyi Bu ◽  
Pingan Li ◽  
Zihao Wu

AbstractTo improve the performance of speech enhancement in a complex noise environment, a joint constrained dictionary learning method for single-channel speech enhancement is proposed, which solves the “cross projection” problem of signals in the joint dictionary. In the method, the new optimization function not only constrains the sparse representation of the noisy signal in the joint dictionary, and controls the projection error of the speech signal and noise signal on the corresponding sub-dictionary, but also minimizes the cross projection error and the correlation between the sub-dictionaries. In addition, the adjustment factors are introduced to balance the weight of constraint terms to obtain the joint dictionary more discriminatively. When the method is applied to the single-channel speech enhancement, speech components of the noisy signal can be more projected onto the clean speech sub-dictionary of the joint dictionary without being affected by the noise sub-dictionary, which makes the quality and intelligibility of the enhanced speech higher. The experimental results verify that our algorithm has better performance than the speech enhancement algorithm based on discriminative dictionary learning under white noise and colored noise environments in time domain waveform, spectrogram, global signal-to-noise ratio, subjective evaluation of speech quality, and logarithmic spectrum distance.

Author(s):  
Shifeng Ou ◽  
Peng Song ◽  
Ying Gao

The a priori signal-to-noise ratio (SNR) plays an essential role in many speech enhancement systems. Most of the existing approaches to estimate the a priori SNR only exploit the amplitude spectra while making the phase neglected. Considering the fact that incorporating phase information into a speech processing system can significantly improve the speech quality, this paper proposes a phase-sensitive decision-directed (DD) approach for the a priori SNR estimate. By representing the short-time discrete Fourier transform (STFT) signal spectra geometrically in a complex plane, the proposed approach estimates the a priori SNR using both the magnitude and phase information while making no assumptions about the phase difference between clean speech and noise spectra. Objective evaluations in terms of the spectrograms, segmental SNR, log-spectral distance (LSD) and short-time objective intelligibility (STOI) measures are presented to demonstrate the superiority of the proposed approach compared to several competitive methods at different noise conditions and input SNR levels.


Geophysics ◽  
2020 ◽  
Vol 85 (3) ◽  
pp. KS51-KS61 ◽  
Author(s):  
Hang Wang ◽  
Quan Zhang ◽  
Guoyin Zhang ◽  
Jinwei Fang ◽  
Yangkang Chen

Microseismic monitoring is an indispensable technique in characterizing the physical processes that are caused by extraction or injection of fluids during the hydraulic fracturing process. Microseismic data, however, are often contaminated with strong random noise and have a low signal-to-noise ratio (S/N). The low S/N in most microseismic data severely affects the accuracy and reliability of the source localization and source-mechanism inversion results. We have developed a new denoising framework to enhance the quality of microseismic data. We use the method of adaptive sparse dictionaries to learn the waveform features of the microseismic data by iteratively updating the dictionary atoms and sparse coefficients in an unsupervised way. Unlike most existing dictionary learning applications in the seismic community, we learn the features from 1D microseismic data, thereby to learn 1D features of the waveforms. We develop a sparse dictionary learning framework and then prepare the training patches and implement the algorithm to obtain favorable denoising performance. We use extensive numerical examples and real microseismic data examples to demonstrate the validity of our method. Results show that the features of microseismic waveforms can be learned to distinguish signal patches and noise patches even from a single channel of microseismic data. However, more training data can make the learned features smoother and better at representing useful signal components.


2020 ◽  
Vol 39 (5) ◽  
pp. 6881-6889
Author(s):  
Jie Wang ◽  
Linhuang Yan ◽  
Jiayi Tian ◽  
Minmin Yuan

In this paper, a bilateral spectrogram filtering (BSF)-based optimally modified log-spectral amplitude (OMLSA) estimator for single-channel speech enhancement is proposed, which can significantly improve the performance of OMLSA, especially in highly non-stationary noise environments, by taking advantage of bilateral filtering (BF), a widely used technology in image and visual processing, to preprocess the spectrogram of the noisy speech. BSF is capable of not only sharpening details, removing unwanted textures or background noise from the noisy speech spectrogram, but also preserving edges when considering a speech spectrogram as an image. The a posteriori signal-to-noise ratio (SNR) of OMLSA algorithm is estimated after applying BSF to the noisy speech. Besides, in order to reduce computing costs, a fast and accurate BF is adopted to reduce the algorithm complexity O(1) for each time-frequency bin. Finally, the proposed algorithm is compared with the original OMLSA and other classic denoising methods using various types of noise with different signal-to-noise ratios in terms of objective evaluation metrics such as segmental signal-to-noise ratio improvement and perceptual evaluation of speech quality. The results show the validity of the improved BSF-based OMLSA algorithm.


2016 ◽  
Vol 41 (2) ◽  
pp. 245-254 ◽  
Author(s):  
Chengli Sun ◽  
Jianxiao Xie ◽  
Yan Leng

Abstract Subspace-based methods have been effectively used to estimate enhanced speech from noisy speech samples. In the traditional subspace approaches, a critical step is splitting of two invariant subspaces associated with signal and noise via subspace decomposition, which is often performed by singular-value decomposition or eigenvalue decomposition. However, these decomposition algorithms are highly sensitive to the presence of large corruptions, resulting in a large amount of residual noise within enhanced speech in low signal-to-noise ratio (SNR) situations. In this paper, a joint low-rank and sparse matrix decomposition (JLSMD) based subspace method is proposed for speech enhancement. In the proposed method, we firstly structure the corrupted data as a Toeplitz matrix and estimate its effective rank value for the underlying clean speech matrix. Then the subspace decomposition is performed by means of JLSMD, where the decomposed low-rank part corresponds to enhanced speech and the sparse part corresponds to noise signal, respectively. An extensive set of experiments have been carried out for both of white Gaussian noise and real-world noise. Experimental results show that the proposed method performs better than conventional methods in many types of strong noise conditions, in terms of yielding less residual noise and lower speech distortion.


Author(s):  
Amart Sulong ◽  
Teddy Surya Gunawan ◽  
Mira Kartiwi

<p><em>In communication medium to satisfy the speech enhancement process by using differents methodologies and algoirthms are the key term in testing the system design well enough to produce the best performance results for the speech system. The Wiener filter is one of the classical algorithm that applied to speech process to avoid the noise attacking the speech signal. In other word, compressive sensing method by randomize measurement matrix are combined with the Wiener filter to analyse the noisy speech signal with less introduce to noise signal and producing high signal to noise ratio. The PESQ is used to measure the quality of the proposed algorithm design. As in the experimental results shows that, attacking of defferent noise environments in speech signal still effectively improve the performance of noisy speech with maintain the high score of the PESQ quality. </em><em></em></p>


2016 ◽  
Vol 82 ◽  
pp. 38-52 ◽  
Author(s):  
Long Zhang ◽  
Guangzhao Bao ◽  
Jing Zhang ◽  
Zhongfu Ye

Sign in / Sign up

Export Citation Format

Share Document