scholarly journals Time-Domain Joint Training Strategies of Speech Enhancement and Intent Classification Neural Models

Sensors ◽  
2022 ◽  
Vol 22 (1) ◽  
pp. 374
Author(s):  
Mohamed Nabih Ali ◽  
Daniele Falavigna ◽  
Alessio Brutti

Robustness against background noise and reverberation is essential for many real-world speech-based applications. One way to achieve this robustness is to employ a speech enhancement front-end that, independently of the back-end, removes the environmental perturbations from the target speech signal. However, although the enhancement front-end typically increases the speech quality from an intelligibility perspective, it tends to introduce distortions which deteriorate the performance of subsequent processing modules. In this paper, we investigate strategies for jointly training neural models for both speech enhancement and the back-end, which optimize a combined loss function. In this way, the enhancement front-end is guided by the back-end to provide more effective enhancement. Differently from typical state-of-the-art approaches employing on spectral features or neural embeddings, we operate in the time domain, processing raw waveforms in both components. As application scenario we consider intent classification in noisy environments. In particular, the front-end speech enhancement module is based on Wave-U-Net while the intent classifier is implemented as a temporal convolutional network. Exhaustive experiments are reported on versions of the Fluent Speech Commands corpus contaminated with noises from the Microsoft Scalable Noisy Speech Dataset, shedding light and providing insight about the most promising training approaches.

2021 ◽  
Author(s):  
Bojian Yin ◽  
Federico Corradi ◽  
Sander M. Bohté

ABSTRACTInspired by more detailed modeling of biological neurons, Spiking neural networks (SNNs) have been investigated both as more biologically plausible and potentially more powerful models of neural computation, and also with the aim of extracting biological neurons’ energy efficiency; the performance of such networks however has remained lacking compared to classical artificial neural networks (ANNs). Here, we demonstrate how a novel surrogate gradient combined with recurrent networks of tunable and adaptive spiking neurons yields state-of-the-art for SNNs on challenging benchmarks in the time-domain, like speech and gesture recognition. This also exceeds the performance of standard classical recurrent neural networks (RNNs) and approaches that of the best modern ANNs. As these SNNs exhibit sparse spiking, we show that they theoretically are one to three orders of magnitude more computationally efficient compared to RNNs with comparable performance. Together, this positions SNNs as an attractive solution for AI hardware implementations.


2014 ◽  
pp. 47-63
Author(s):  
Jacob Benesty ◽  
Jesper Jensen ◽  
Mads Graesboll Christensen ◽  
Jingdong Chen

2014 ◽  
Vol 1046 ◽  
pp. 384-387
Author(s):  
Jin Li ◽  
Kun Shen

Aiming at traditional methods cannot get good performance in noisy environments, an improved method for speech enhancement based on Empirical Mode Decomposition (EMD) and Morphology Filtering (MF) was proposed. The method firstly uses EMD to obtain Intrinsic Mode Function (IMF) and for hard threshold processing, then selects appropriate structuring element to construct MF for filtering processing in remaining IMFs. Finally, speech enhancement signal is reconstructed for each IMFs. Experimental results show that the proposed method for speech enhancement has better de-noising effect by comparing time-domain waveform and spectrogram. Moreover, the quality of reconstructed speech enhancement signal has been significantly improved.


2020 ◽  
Author(s):  
Matthias Ellmer ◽  
David Wiese ◽  
Christopher McCullough ◽  
Dah-Ning Yuan ◽  
Eugene Fahnestock

<p class="Standard">Developing meaningful uncertainty quantifications for GRACE or GRACE-FO derived products, e.g. water storage anomalies, requires a robust understanding of the information and noise content in the observables employed in their estimation.</p> <p class="Textbody">The stochastic models for GRACE and GRACE-FO K-Band, and GPS carrier phase and pseudorange observables employed in upcoming JPL solutions will be presented. Within these models, the time-domain correlations for each of the observations are estimated, and then applied in the least squares estimate of monthly gravity field solutions. Reproducing results from other groups, the resulting formal errors of monthly solutions are improved.</p> <p class="Standard">We compare this approach to the current state of the art at JPL, and show that noise content in the determined gravity field solutions is reduced. We further demonstrate the application of this method to data from the GRACE-FO Laser ranging interferometer.</p>


Author(s):  
Daniel Merino Hoyos ◽  
Erik Falkenberg ◽  
Petter Stuberg

While the size of new built semi-submersibles is steadily increasing, their mooring systems are not experiencing a similar change. Most of the sixth-generation drilling units rely heavily on thruster assisted mooring to increase their operability in harsh environments and shallow waters. Frequency domain programs are commonly used to calculate mooring line tensions in the design analyses. This analysis technique assumes linearity in both the mooring system and the thruster assist controller. The study presented in this paper examines the validity of these assumptions by comparing frequency and time domain analyses in two state-of-the-art analysis programs. The linear thruster assist controller in frequency domain analyses is benchmarked against the Kalman filter-based controller in time domain simulations and against a real thruster assist controller coupled with the time domain software.


Sign in / Sign up

Export Citation Format

Share Document