Sound Event Detection Using Derivative Features in Deep Neural Networks

We propose using derivative features for sound event detection based on deep neural networks. As input to the networks, we used log-mel-filterbank and its first and second derivative features for each frame of the audio signal. Two deep neural networks were used to evaluate the effectiveness of these derivative features. Specifically, a convolutional recurrent neural network (CRNN) was constructed by combining a convolutional neural network and a recurrent neural networks (RNN) followed by a feed-forward neural network (FNN) acting as a classification layer. In addition, a mean-teacher model based on an attention CRNN was used. Both models had an average pooling layer at the output so that weakly labeled and unlabeled audio data may be used during model training. Under the various training conditions, depending on the neural network architecture and training set, the use of derivative features resulted in a consistent performance improvement by using the derivative features. Experiments on audio data from the Detection and Classification of Acoustic Scenes and Events 2018 and 2019 challenges indicated that a maximum relative improvement of 16.9% was obtained in terms of the F-score.

Download Full-text

Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for \\Weakly Labeled Semi-supervised Sound Event Detection

10.31219/osf.io/m5eba ◽

2021 ◽

Author(s):

Janek Ebbers ◽

Reinhold Haeb-Umbach

Keyword(s):

Neural Network ◽

Neural Networks ◽

Recurrent Neural Network ◽

Event Detection ◽

Forward Direction ◽

Time Step ◽

Sound Event ◽

Sound Event Detection ◽

Validation Set ◽

Classi Fication

In this paper we present our system for thedetection and classi-fication of acoustic scenes and events (DCASE) 2020 ChallengeTask 4: Sound event detection and separation in domestic envi-ronments. We introduce two new models: the forward-backwardconvolutional recurrent neural network (FBCRNN) and the tag-conditioned convolutional neural network (CNN). The FBCRNNemploys two recurrent neural network (RNN) classifiers sharing thesame CNN for preprocessing. With one RNN processing a record-ing in forward direction and the other in backward direction, thetwo networks are trained to jointly predict audio tags, i.e., weak la-bels, at each time step within a recording, given that at each timestep they have jointly processed the whole recording. The pro-posed training encourages the classifiers to tag events as soon aspossible. Therefore, after training, the networks can be appliedto shorter audio segments of, e.g.,200 ms, allowing sound eventdetection (SED). Further, we propose a tag-conditioned CNN tocomplement SED. It is trained to predict strong labels while using(predicted) tags, i.e., weak labels, as additional input. For train-ing pseudo strong labels from a FBCRNN ensemble are used. Thepresented system scored the fourth and third place in the systemsand teams rankings, respectively. Subsequent improvements allowour system to even outperform the challenge baseline and winnersystems in average by, respectively,18.0 %and2.2 %event-basedF1-score on the validation set. Source code is publicly available athttps://github.com/fgnt/pb_sed

Download Full-text

Polyphonic sound event detection using multi label deep neural networks

2015 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2015.7280624 ◽

2015 ◽

Cited By ~ 62

Author(s):

Emre Cakir ◽

Toni Heittola ◽

Heikki Huttunen ◽

Tuomas Virtanen

Keyword(s):

Neural Networks ◽

Event Detection ◽

Deep Neural Networks ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Multi-label vs. combined single-label sound event detection with deep neural networks

2015 23rd European Signal Processing Conference (EUSIPCO) ◽

10.1109/eusipco.2015.7362845 ◽

2015 ◽

Cited By ~ 10

Author(s):

Emre Cakir ◽

Toni Heittola ◽

Heikki Huttunen ◽

Tuomas Virtanen

Keyword(s):

Neural Networks ◽

Event Detection ◽

Deep Neural Networks ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Sound event detection using deep neural networks

TELKOMNIKA (Telecommunication Computing Electronics and Control) ◽

10.12928/telkomnika.v18i5.14246 ◽

2020 ◽

Vol 18 (5) ◽

pp. 2587

Author(s):

Suk-Hwan Jung ◽

Yong-Joo Chung

Keyword(s):

Neural Networks ◽

Event Detection ◽

Deep Neural Networks ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Neural Network Distillation on IoT Platforms for Sound Event Detection

10.21437/interspeech.2019-2394 ◽

2019 ◽

Cited By ~ 3

Author(s):

Gianmarco Cerutti ◽

Rahul Prasad ◽

Alessio Brutti ◽

Elisabetta Farella

Keyword(s):

Neural Network ◽

Event Detection ◽

Sound Event ◽

Iot Platforms ◽

Sound Event Detection

Download Full-text

Sound Event Detection by Consistency Training and Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414350 ◽

2021 ◽

Author(s):

Chih-Yuan Koh ◽

You-Siang Chen ◽

Yi-Wen Liu ◽

Mingsian R. Bai

Keyword(s):

Neural Networks ◽

Event Detection ◽

Recurrent Neural Networks ◽

Sound Event ◽

Feature Pyramid ◽

Sound Event Detection

Download Full-text

Reynolds averaged turbulence modelling using deep neural networks with embedded invariance

Journal of Fluid Mechanics ◽

10.1017/jfm.2016.615 ◽

2016 ◽

Vol 807 ◽

pp. 155-166 ◽

Cited By ~ 274

Author(s):

Julia Ling ◽

Andrew Kurzawski ◽

Jeremy Templeton

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reynolds Stress ◽

Network Architecture ◽

Eddy Viscosity ◽

Deep Neural Networks ◽

Test Cases ◽

Neural Network Architecture ◽

Stress Anisotropy ◽

Anisotropy Tensor

There exists significant demand for improved Reynolds-averaged Navier–Stokes (RANS) turbulence models that are informed by and can represent a richer set of turbulence physics. This paper presents a method of using deep neural networks to learn a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. A novel neural network architecture is proposed which uses a multiplicative layer with an invariant tensor basis to embed Galilean invariance into the predicted anisotropy tensor. It is demonstrated that this neural network architecture provides improved prediction accuracy compared with a generic neural network architecture that does not embed this invariance property. The Reynolds stress anisotropy predictions of this invariant neural network are propagated through to the velocity field for two test cases. For both test cases, significant improvement versus baseline RANS linear eddy viscosity and nonlinear eddy viscosity models is demonstrated.

Download Full-text

Non-Negative Matrix Factorization-Convolutional Neural Network (NMF-CNN) for Sound Event Detection

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019) ◽

10.33682/50ef-dx29 ◽

2019 ◽

Cited By ~ 1

Author(s):

Teck Kai Chan ◽

Cheng Siong Chin ◽

Ye Li

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Event Detection ◽

Matrix Factorization ◽

Sound Event ◽

Sound Event Detection ◽

Non Negative Matrix Factorization

Download Full-text

Part-of-Speech Tagging via Deep Neural Networks for Northern-Ethiopic Languages

Information Technology And Control ◽

10.5755/j01.itc.49.4.26808 ◽

2020 ◽

Vol 49 (4) ◽

pp. 482-494

Author(s):

Jurgita Kapočiūtė-Dzikienė ◽

Senait Gebremichael Tesfagergish

Keyword(s):

Neural Network ◽

Neural Networks ◽

Language Processing ◽

Deep Neural Networks ◽

Short Term Memory ◽

Parameter Tuning ◽

Feed Forward Neural Network ◽

Pos Tagging ◽

Part Of Speech ◽

Pos Tagger

Deep Neural Networks (DNNs) have proven to be especially successful in the area of Natural Language Processing (NLP) and Part-Of-Speech (POS) tagging—which is the process of mapping words to their corresponding POS labels depending on the context. Despite recent development of language technologies, low-resourced languages (such as an East African Tigrinya language), have received too little attention. We investigate the effectiveness of Deep Learning (DL) solutions for the low-resourced Tigrinya language of the Northern-Ethiopic branch. We have selected Tigrinya as the testbed example and have tested state-of-the-art DL approaches seeking to build the most accurate POS tagger. We have evaluated DNN classifiers (Feed Forward Neural Network – FFNN, Long Short-Term Memory method – LSTM, Bidirectional LSTM, and Convolutional Neural Network – CNN) on a top of neural word2vec word embeddings with a small training corpus known as Nagaoka Tigrinya Corpus. To determine the best DNN classifier type, its architecture and hyper-parameter set both manual and automatic hyper-parameter tuning has been performed. BiLSTM method was proved to be the most suitable for our solving task: it achieved the highest accuracy equal to 92% that is 65% above the random baseline.

Download Full-text

Sound event detection in real life audio using perceptual linear predictive feature with neural network

2018 15th International Bhurban Conference on Applied Sciences and Technology (IBCAST) ◽

10.1109/ibcast.2018.8312252 ◽

2018 ◽

Cited By ~ 1

Author(s):

Khizer Feroze ◽

Abdur Rahman Maud

Keyword(s):

Neural Network ◽

Event Detection ◽

Real Life ◽

Sound Event ◽

Sound Event Detection ◽

Predictive Feature

Download Full-text