Neural Network Distillation on IoT Platforms for Sound Event Detection

In this paper we present our system for thedetection and classi-fication of acoustic scenes and events (DCASE) 2020 ChallengeTask 4: Sound event detection and separation in domestic envi-ronments. We introduce two new models: the forward-backwardconvolutional recurrent neural network (FBCRNN) and the tag-conditioned convolutional neural network (CNN). The FBCRNNemploys two recurrent neural network (RNN) classifiers sharing thesame CNN for preprocessing. With one RNN processing a record-ing in forward direction and the other in backward direction, thetwo networks are trained to jointly predict audio tags, i.e., weak la-bels, at each time step within a recording, given that at each timestep they have jointly processed the whole recording. The pro-posed training encourages the classifiers to tag events as soon aspossible. Therefore, after training, the networks can be appliedto shorter audio segments of, e.g.,200 ms, allowing sound eventdetection (SED). Further, we propose a tag-conditioned CNN tocomplement SED. It is trained to predict strong labels while using(predicted) tags, i.e., weak labels, as additional input. For train-ing pseudo strong labels from a FBCRNN ensemble are used. Thepresented system scored the fourth and third place in the systemsand teams rankings, respectively. Subsequent improvements allowour system to even outperform the challenge baseline and winnersystems in average by, respectively,18.0 %and2.2 %event-basedF1-score on the validation set. Source code is publicly available athttps://github.com/fgnt/pb_sed

Download Full-text

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

2019 8th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW) ◽

10.1109/aciiw.2019.8925176 ◽

2019 ◽

Cited By ~ 1

Author(s):

Yingmei Guo ◽

Mingxing Xu ◽

Zhiyong Wu ◽

Jianming Wu ◽

Bin Su

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Event Detection ◽

Ensemble Method ◽

Multi Scale ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Multi-Scale Recurrent Neural Network for Sound Event Detection

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2018.8462006 ◽

2018 ◽

Cited By ~ 6

Author(s):

Rui Lu ◽

Zhiyao Duan ◽

Changshui Zhang

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Event Detection ◽

Multi Scale ◽

Sound Event ◽

Sound Event Detection

Download Full-text

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

IEEE Access ◽

10.1109/access.2020.3015047 ◽

2020 ◽

Vol 8 ◽

pp. 147337-147348

Author(s):

Keming Zhang ◽

Yuanwen Cai ◽

Yuan Ren ◽

Ruida Ye ◽

Liang He

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Event Detection ◽

Time Frequency ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

Sensors ◽

10.3390/s20071883 ◽

2020 ◽

Vol 20 (7) ◽

pp. 1883 ◽

Cited By ~ 1

Author(s):

Kyoungjin Noh ◽

Joon-Hyuk Chang

Keyword(s):

Neural Network ◽

Loss Function ◽

Event Detection ◽

Deep Neural Network ◽

Joint Optimization ◽

Audio Signals ◽

Sound Event ◽

Minimum Variance Distortionless Response ◽

Reverberant Environments ◽

Sound Event Detection

In this paper, we propose joint optimization of deep neural network (DNN)-supported dereverberation and beamforming for the convolutional recurrent neural network (CRNN)-based sound event detection (SED) in multi-channel environments. First, the short-time Fourier transform (STFT) coefficients are calculated from multi-channel audio signals under the noisy and reverberant environments, which are then enhanced by the DNN-supported weighted prediction error (WPE) dereverberation with the estimated masks. Next, the STFT coefficients of the dereverberated multi-channel audio signals are conveyed to the DNN-supported minimum variance distortionless response (MVDR) beamformer in which DNN-supported MVDR beamforming is carried out with the source and noise masks estimated by the DNN. As a result, the single-channel enhanced STFT coefficients are shown at the output and tossed to the CRNN-based SED system, and then, the three modules are jointly trained by the single loss function designed for SED. Furthermore, to ease the difficulty of training a deep learning model for SED caused by the imbalance in the amount of data for each class, the focal loss is used as a loss function. Experimental results show that joint training of DNN-supported dereverberation and beamforming with the SED model under the supervision of focal loss significantly improves the performance under the noisy and reverberant environments.

Download Full-text

U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization

Proceedings of the 2020 5th International Conference on Multimedia Systems and Signal Processing ◽

10.1145/3404716.3404726 ◽

2020 ◽

Author(s):

Lihong Pi ◽

Xue Zheng ◽

Chun Zhang ◽

Ping Chen ◽

Zhe Wang ◽

...

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Event Detection ◽

Sound Event ◽

Sound Event Detection ◽

Detection And Localization

Download Full-text

Neural Network Distillation on IoT Platforms for Sound Event Detection

Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) for Sound Event Detection

Sound event detection in real life audio using perceptual linear predictive feature with neural network

Waveform-based End-to-end Deep Convolutional Neural Network with Multi-scale Sliding Windows for Weakly Labeled Sound Event Detection

A neural network approach for sound event detection in real life audio

Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for \\Weakly Labeled Semi-supervised Sound Event Detection

Multi-Scale Convolutional Recurrent Neural Network with Ensemble Method for Weakly Labeled Sound Event Detection

Multi-Scale Recurrent Neural Network for Sound Event Detection

MTF-CRNN: Multiscale Time-Frequency Convolutional Recurrent Neural Network for Sound Event Detection

Joint Optimization of Deep Neural Network-Based Dereverberation and Beamforming for Sound Event Detection in Multi-Channel Environments

U Recurrent Neural Network for Polyphonic Sound Event Detection and Localization

Export Citation Format