Weighted Speech Distortion Losses for Neural-Network-Based Real-Time Speech Enhancement

This paper proposes an improved speech enhancement algorithm based on Wiener-Filtering, which addresses the problems of speech distortion and musical noise. The proposed algorithm adopts the masking properties of human auditory system on calculating the gain of spectrum point, in order that the signal in the enhanced speech whose energy is lower than the threshold will not be decreased further and the less distortion will be brought to enhanced speech by the trade-off between the noise elimination and speech signal distortion. What’s more, in order to eliminate the “musical noise”, a spectrum-shaping technology using averaging method between adjacent frames is adopted. And to guarantee the real-time application, two-stage moving-average strategy is adopted. The computer simulation results show that the proposed algorithm is superior to the traditional Wiener method in the low CPU cost, real-time statistics, the reduction of the speech distortion and residual musical noise.

Download Full-text

Real-Time Multi-Channel Speech Enhancement Based on Neural Network Masking with Attention Model

10.21437/interspeech.2021-2266 ◽

2021 ◽

Author(s):

Cheng Xue ◽

Weilong Huang ◽

Weiguang Chen ◽

Jinwei Feng

Keyword(s):

Neural Network ◽

Real Time ◽

Speech Enhancement ◽

Attention Model

Download Full-text

A Convolutional Recurrent Neural Network for Real-Time Speech Enhancement

10.21437/interspeech.2018-1405 ◽

2018 ◽

Cited By ~ 38

Author(s):

Ke Tan ◽

DeLiang Wang

Keyword(s):

Neural Network ◽

Real Time ◽

Speech Enhancement ◽

Recurrent Neural Network

Download Full-text

A Real-Time Dual-Microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor

Sensors ◽

10.3390/s20185050 ◽

2020 ◽

Vol 20 (18) ◽

pp. 5050

Author(s):

Yi Zhou ◽

Yufan Chen ◽

Yongbao Ma ◽

Hongqing Liu

Keyword(s):

Neural Network ◽

Real Time ◽

Speech Enhancement ◽

Bone Conduction ◽

Speech Sound ◽

Smart Devices ◽

Scale Invariant ◽

The Neural Network ◽

Adaptive Noise ◽

Adaptive Noise Canceller

The quality and intelligibility of the speech are usually impaired by the interference of background noise when using internet voice calls. To solve this problem in the context of wearable smart devices, this paper introduces a dual-microphone, bone-conduction (BC) sensor assisted beamformer and a simple recurrent unit (SRU)-based neural network postfilter for real-time speech enhancement. Assisted by the BC sensor, which is insensitive to the environmental noise compared to the regular air-conduction (AC) microphone, the accurate voice activity detection (VAD) can be obtained from the BC signal and incorporated into the adaptive noise canceller (ANC) and adaptive block matrix (ABM). The SRU-based postfilter consists of a recurrent neural network with a small number of parameters, which improves the computational efficiency. The sub-band signal processing is designed to compress the input features of the neural network, and the scale-invariant signal-to-distortion ratio (SI-SDR) is developed as the loss function to minimize the distortion of the desired speech signal. Experimental results demonstrate that the proposed real-time speech enhancement system provides significant speech sound quality and intelligibility improvements for all noise types and levels when compared with the AC-only beamformer with a postfiltering algorithm.

Download Full-text