A neural network based noise suppression method for transient noise control with low-complexity computation

2021 ◽  
Vol 263 (1) ◽  
pp. 5902-5909
Author(s):  
Yiya Hao ◽  
Shuai Cheng ◽  
Gong Chen ◽  
Yaobin Chen ◽  
Liang Ruan

Over the decades, the noise-suppression (NS) methods for speech enhancement (SE) have been widely utilized, including the conventional signal processing methods and the deep neural networks (DNN) methods. Although stationary-noise can be suppressed successfully using conventional or DNN methods, it is significantly challenging while suppressing the non-stationary noise, especially the transient noise. Compared to conventional NS methods, DNN NS methods may work more effectively under non-stationary noises by learning the noises' temporal-frequency characteristics. However, most DNN methods are challenging to be implemented on mobile devices due to their heavy computation complexity. Indeed, even a few low-complexity DNN methods are proposed for real-time purposes, the robustness and the generalization degrade for different types of noise. This paper proposes a single channel DNN-based NS method for transient noise with low computation complexity. The proposed method enhanced the signal-to-noise ratio (SNR) while minimizing the speech's distortion, resulting in a superior improvement of the speech quality over different noise types, including transient noise.

Author(s):  
Maximilian Strake ◽  
Bruno Defraene ◽  
Kristoff Fluyt ◽  
Wouter Tirry ◽  
Tim Fingscheidt

AbstractSingle-channel speech enhancement in highly non-stationary noise conditions is a very challenging task, especially when interfering speech is included in the noise. Deep learning-based approaches have notably improved the performance of speech enhancement algorithms under such conditions, but still introduce speech distortions if strong noise suppression shall be achieved. We propose to address this problem by using a two-stage approach, first performing noise suppression and subsequently restoring natural sounding speech, using specifically chosen neural network topologies and loss functions for each task. A mask-based long short-term memory (LSTM) network is employed for noise suppression and speech restoration is performed via spectral mapping with a convolutional encoder-decoder network (CED). The proposed method improves speech quality (PESQ) over state-of-the-art single-stage methods by about 0.1 points for unseen highly non-stationary noise types including interfering speech. Furthermore, it is able to increase intelligibility in low-SNR conditions and consistently outperforms all reference methods.


2017 ◽  
Vol 29 (1) ◽  
pp. 114-124
Author(s):  
Kazuhiro Nakadai ◽  
◽  
Taiki Tezuka ◽  
Takami Yoshida ◽  

[abstFig src='/00290001/11.jpg' width='300' text='Ego-noise suppression achieves speech recognition even during motion' ] This paper addresses ego-motion noise suppression for a robot. Many ego-motion noise suppression methods use motion information such as position, velocity, and the acceleration of each joint to infer ego-motion noise. However, such inferences are not reliable, since motion information and ego-motion noise are not always correlated. We propose a new framework for ego-motion noise suppression based on single channel processing using only acoustic signals captured with a microphone. In the proposed framework, ego-motion noise features and their numbers are automatically estimated in advance from an ego-motion noise input using Infinite Non-negative Matrix Factorization (INMF), which is a non-parametric Bayesian model that does not use explicit motion information. After that, the proposed Semi-Blind INMF (SB-INMF) is applied to an input signal that consists of both the target and ego-motion noise signals. Ego-motion noise features, which are obtained with INMF, are used as inputs to the SB-INMF, and are treated as the fixed features for extracting the target signal. Finally, the target signal is extracted with SB-INMF using these newly-estimated features. The proposed framework was applied to ego-motion noise suppression on two types of humanoid robots. Experimental results showed that ego-motion noise was effectively and efficiently suppressed in terms of both signal-to-noise ratio and performance of automatic speech recognition compared to a conventional template-based ego-motion noise suppression method using motion information. Thus, the proposed method worked properly on a robot without a motion information interface.**This work is an extension of our publication “Taiki Tezuka, Takami Yoshida, Kazuhiro Nakadai: Ego-motion noise suppression for robots based on Semi-Blind Infinite Non-negative Matrix Factorization, ICRA 2014, pp.6293-6298, 2014.”


Author(s):  
Yuxuan Ke ◽  
Andong Li ◽  
Chengshi Zheng ◽  
Renhua Peng ◽  
Xiaodong Li

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.


2012 ◽  
Vol 58 (3) ◽  
pp. 273-278 ◽  
Author(s):  
Taleb Moazzeni ◽  
Amei Amei ◽  
Jian Ma ◽  
Yingtao Jiang

Abstract Signal-to-noise ratio (SNR) information is required in many communication receivers and their proper operation is, to a large extent, related to the SNR estimation techniques they employ. Most of the available SNR estimators are based on approaches that either require large observation length or suffer from high computation complexity. In this paper, we propose a low complexity, yet accurate SNR estimation technique that is sufficient to yield meaningful estimation for short data records. It is shown that our estimator is fairly close to the (CRLB) for high SNR values. Numerical results also confirm that, in terms of convergence speed, the proposed technique outperforms the popular moment based method, M2M4


Author(s):  
Wenchao Du ◽  
Hu Chen ◽  
Hongyu Yang ◽  
Yi Zhang

AbstractGenerative adversarial network (GAN) has been applied for low-dose CT images to predict normal-dose CT images. However, the undesired artifacts and details bring uncertainty to the clinical diagnosis. In order to improve the visual quality while suppressing the noise, in this paper, we mainly studied the two key components of deep learning based low-dose CT (LDCT) restoration models—network architecture and adversarial loss, and proposed a disentangled noise suppression method based on GAN (DNSGAN) for LDCT. Specifically, a generator network, which contains the noise suppression and structure recovery modules, is proposed. Furthermore, a multi-scaled relativistic adversarial loss is introduced to preserve the finer structures of generated images. Experiments on simulated and real LDCT datasets show that the proposed method can effectively remove noise while recovering finer details and provide better visual perception than other state-of-the-art methods.


Geophysics ◽  
2006 ◽  
Vol 71 (3) ◽  
pp. V79-V86 ◽  
Author(s):  
Hakan Karsli ◽  
Derman Dondurur ◽  
Günay Çifçi

Time-dependent amplitude and phase information of stacked seismic data are processed independently using complex trace analysis in order to facilitate interpretation by improving resolution and decreasing random noise. We represent seismic traces using their envelopes and instantaneous phases obtained by the Hilbert transform. The proposed method reduces the amplitudes of the low-frequency components of the envelope, while preserving the phase information. Several tests are performed in order to investigate the behavior of the present method for resolution improvement and noise suppression. Applications on both 1D and 2D synthetic data show that the method is capable of reducing the amplitudes and temporal widths of the side lobes of the input wavelets, and hence, the spectral bandwidth of the input seismic data is enhanced, resulting in an improvement in the signal-to-noise ratio. The bright-spot anomalies observed on the stacked sections become clearer because the output seismic traces have a simplified appearance allowing an easier data interpretation. We recommend applying this simple signal processing for signal enhancement prior to interpretation, especially for single channel and low-fold seismic data.


Sign in / Sign up

Export Citation Format

Share Document