sound event detection Latest Research Papers

Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED’s exploited using a segmentation mask for detecting the target feature map. However, achieving accurate detection performance was limited in real streaming audio due to the following reasons. First, the convolutional neural networks (CNN) employed in the segmentation mask extraction process do not appropriately highlight the importance of feature as the feature is extracted without pooling operations, and, concurrently, a small size kernel forces the receptive field small, making it difficult to learn various patterns. Second, as feature maps are obtained in an end-to-end fashion, the WSED model would be weak to unknown contents in the wild. These limitations would lead to generating undesired feature maps, such as noise in the unseen environment. This paper addresses these issues by constructing a more efficient model by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field. In addition, this paper proposes pseudo-label-based learning for classifying target contents and unknown contents by adding ’noise label’ and ’noise loss’ so that unknown contents can be separated as much as possible through the noise label. The experiment is performed by mixing DCASE 2018 task1 acoustic scene data and task2 sound event data. The experimental results show that the proposed SED model achieves the best F1 performance with 59.7% at 0 SNR, 64.5% at 10 SNR, and 65.9% at 20 SNR. These results represent an improvement of 17.7%, 16.9%, and 16.5%, respectively, over the baseline.

Download Full-text

An Analysis of Sound Event Detection under Acoustic Degradation Using Multi-Resolution Systems

Applied Sciences ◽

10.3390/app112311561 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11561

Author(s):

Diego de Benito-Gorrón ◽

Daniel Ramos ◽

Doroteo T. Toledano

Keyword(s):

Event Detection ◽

Dynamic Range ◽

Detection Systems ◽

Dynamic Range Compression ◽

Sound Event ◽

Event Category ◽

Low Pass ◽

Sound Event Detection ◽

Acoustic Conditions

The Sound Event Detection task aims to determine the temporal locations of acoustic events in audio clips. In recent years, the relevance of this field is rising due to the introduction of datasets such as Google AudioSet or DESED (Domestic Environment Sound Event Detection) and competitive evaluations like the DCASE Challenge (Detection and Classification of Acoustic Scenes and Events). In this paper, we analyze the performance of Sound Event Detection systems under diverse artificial acoustic conditions such as high- or low-pass filtering and clipping or dynamic range compression, as well as under an scenario of high overlap between events. For this purpose, the audio was obtained from the Evaluation subset of the DESED dataset, whereas the systems were trained in the context of the DCASE Challenge 2020 Task 4. Our systems are based upon the challenge baseline, which consists of a Convolutional-Recurrent Neural Network trained using the Mean Teacher method, and they employ a multiresolution approach which is able to improve the Sound Event Detection performance through the use of several resolutions during the extraction of Mel-spectrogram features. We provide insights on the benefits of this multiresolution approach in different acoustic settings, and compare the performance of the single-resolution systems in the aforementioned scenarios when using different resolutions. Furthermore, we complement the analysis of the performance in the high-overlap scenario by assessing the degree of overlap of each event category in sound event detection datasets.

Download Full-text

MSFF-net: Multi-scale feature fusing networks with dilated mixed convolution and cascaded parallel framework for sound event detection

Digital Signal Processing ◽

10.1016/j.dsp.2021.103319 ◽

2021 ◽

pp. 103319

Author(s):

Yingbin Wang ◽

Guanghui Zhao ◽

Kai Xiong ◽

Guangming Shi

Keyword(s):

Event Detection ◽

Scale Feature ◽

Multi Scale ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Sound Event Detection: A Wavelet Based Approach For Weakly Labelled Data

10.1109/ibssc53889.2021.9673263 ◽

2021 ◽

Author(s):

Amit Kumar ◽

Vinal Patel

Keyword(s):

Event Detection ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Affinity Mixup for Weakly Supervised Sound Event Detection

10.1109/mlsp52302.2021.9596270 ◽

2021 ◽

Author(s):

Mohammad Rasool Izadi ◽

Robert Stevenson ◽

Laura Kloepper

Keyword(s):

Event Detection ◽

Sound Event ◽

Sound Event Detection ◽

Weakly Supervised

Download Full-text

Crowdsourcing Strong Labels for Sound Event Detection

10.1109/waspaa52581.2021.9632761 ◽

2021 ◽

Author(s):

Irene Martin-Morato ◽

Manu Harju ◽

Annamaria Mesaros

Keyword(s):

Event Detection ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Sound Event Detection with Adaptive Frequency Selection

10.1109/waspaa52581.2021.9632798 ◽

2021 ◽

Author(s):

Zhepei Wang ◽

Jonah Casebeer ◽

Adam Clemmitt ◽

Efthymios Tzinis ◽

Paris Smaragdis

Keyword(s):

Event Detection ◽

Frequency Selection ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Sensors ◽

10.3390/s21206718 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6718

Author(s):

Jin-Young Son ◽

Joon-Hyuk Chang

Keyword(s):

Time Domain ◽

Event Detection ◽

Noise Suppression ◽

Classification Performance ◽

Fine Tuning ◽

Model Complexity ◽

Convolutional Network ◽

Sound Event ◽

Sound Event Detection ◽

Noise Robust

Sound event detection (SED) recognizes the corresponding sound event of an incoming signal and estimates its temporal boundary. Although SED has been recently developed and used in various fields, achieving noise-robust SED in a real environment is typically challenging owing to the performance degradation due to ambient noise. In this paper, we propose combining a pretrained time-domain speech-separation-based noise suppression network (NS) and a pretrained classification network to improve the SED performance in real noisy environments. We use group communication with a context codec method (GC3)-equipped temporal convolutional network (TCN) for the noise suppression model and a convolutional recurrent neural network for the SED model. The former significantly reduce the model complexity while maintaining the same TCN module and performance as a fully convolutional time-domain audio separation network (Conv-TasNet). We also do not update the weights of some layers (i.e., freeze) in the joint fine-tuning process and add an attention module in the SED model to further improve the performance and prevent overfitting. We evaluate our proposed method using both simulation and real recorded datasets. The experimental results show that our method improves the classification performance in a noisy environment under various signal-to-noise-ratio conditions.

Download Full-text

Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for \\Weakly Labeled Semi-supervised Sound Event Detection

10.31219/osf.io/m5eba ◽

2021 ◽

Author(s):

Janek Ebbers ◽

Reinhold Haeb-Umbach

Keyword(s):

Neural Network ◽

Neural Networks ◽

Recurrent Neural Network ◽

Event Detection ◽

Forward Direction ◽

Time Step ◽

Sound Event ◽

Sound Event Detection ◽

Validation Set ◽

Classi Fication

In this paper we present our system for thedetection and classi-fication of acoustic scenes and events (DCASE) 2020 ChallengeTask 4: Sound event detection and separation in domestic envi-ronments. We introduce two new models: the forward-backwardconvolutional recurrent neural network (FBCRNN) and the tag-conditioned convolutional neural network (CNN). The FBCRNNemploys two recurrent neural network (RNN) classifiers sharing thesame CNN for preprocessing. With one RNN processing a record-ing in forward direction and the other in backward direction, thetwo networks are trained to jointly predict audio tags, i.e., weak la-bels, at each time step within a recording, given that at each timestep they have jointly processed the whole recording. The pro-posed training encourages the classifiers to tag events as soon aspossible. Therefore, after training, the networks can be appliedto shorter audio segments of, e.g.,200 ms, allowing sound eventdetection (SED). Further, we propose a tag-conditioned CNN tocomplement SED. It is trained to predict strong labels while using(predicted) tags, i.e., weak labels, as additional input. For train-ing pseudo strong labels from a FBCRNN ensemble are used. Thepresented system scored the fourth and third place in the systemsand teams rankings, respectively. Subsequent improvements allowour system to even outperform the challenge baseline and winnersystems in average by, respectively,18.0 %and2.2 %event-basedF1-score on the validation set. Source code is publicly available athttps://github.com/fgnt/pb_sed

Download Full-text

sound event detection
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Anomalous sound event detection: A survey of machine learning based methods and applications

Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset

An Analysis of Sound Event Detection under Acoustic Degradation Using Multi-Resolution Systems

MSFF-net: Multi-scale feature fusing networks with dilated mixed convolution and cascaded parallel framework for sound event detection

Sound Event Detection: A Wavelet Based Approach For Weakly Labelled Data

Affinity Mixup for Weakly Supervised Sound Event Detection

Crowdsourcing Strong Labels for Sound Event Detection

Sound Event Detection with Adaptive Frequency Selection

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for \\Weakly Labeled Semi-supervised Sound Event Detection

Export Citation Format

sound event detectionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Anomalous sound event detection: A survey of machine learning based methods and applications

Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset

An Analysis of Sound Event Detection under Acoustic Degradation Using Multi-Resolution Systems

MSFF-net: Multi-scale feature fusing networks with dilated mixed convolution and cascaded parallel framework for sound event detection

Sound Event Detection: A Wavelet Based Approach For Weakly Labelled Data

Affinity Mixup for Weakly Supervised Sound Event Detection

Crowdsourcing Strong Labels for Sound Event Detection

Sound Event Detection with Adaptive Frequency Selection

Attention-Based Joint Training of Noise Suppression and Sound Event Detection for Noise-Robust Classification

Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for \\Weakly Labeled Semi-supervised Sound Event Detection

sound event detection
Recently Published Documents