Sound Event Detection by Pseudo-Labeling in Weakly Labeled Dataset

Weakly labeled sound event detection (WSED) is an important task as it can facilitate the data collection efforts before constructing a strongly labeled sound event dataset. Recent high performance in deep learning-based WSED’s exploited using a segmentation mask for detecting the target feature map. However, achieving accurate detection performance was limited in real streaming audio due to the following reasons. First, the convolutional neural networks (CNN) employed in the segmentation mask extraction process do not appropriately highlight the importance of feature as the feature is extracted without pooling operations, and, concurrently, a small size kernel forces the receptive field small, making it difficult to learn various patterns. Second, as feature maps are obtained in an end-to-end fashion, the WSED model would be weak to unknown contents in the wild. These limitations would lead to generating undesired feature maps, such as noise in the unseen environment. This paper addresses these issues by constructing a more efficient model by employing a gated linear unit (GLU) and dilated convolution to improve the problems of de-emphasizing importance and lack of receptive field. In addition, this paper proposes pseudo-label-based learning for classifying target contents and unknown contents by adding ’noise label’ and ’noise loss’ so that unknown contents can be separated as much as possible through the noise label. The experiment is performed by mixing DCASE 2018 task1 acoustic scene data and task2 sound event data. The experimental results show that the proposed SED model achieves the best F1 performance with 59.7% at 0 SNR, 64.5% at 10 SNR, and 65.9% at 20 SNR. These results represent an improvement of 17.7%, 16.9%, and 16.5%, respectively, over the baseline.

Download Full-text

Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection

10.21437/interspeech.2020-995 ◽

2020 ◽

Author(s):

Yefei Chen ◽

Heinrich Dinkel ◽

Mengyue Wu ◽

Kai Yu

Keyword(s):

Event Detection ◽

Voice Activity Detection ◽

Activity Detection ◽

Sound Event ◽

In The Wild ◽

Sound Event Detection ◽

Weakly Supervised ◽

Voice Activity

Download Full-text

Voice Activity Detection in the Wild via Weakly Supervised Sound Event Detection

10.21437/interspeech.2020-0995 ◽

2020 ◽

Author(s):

Yefei Chen ◽

Heinrich Dinkel ◽

Mengyue Wu ◽

Kai Yu

Keyword(s):

Event Detection ◽

Voice Activity Detection ◽

Activity Detection ◽

Sound Event ◽

In The Wild ◽

Sound Event Detection ◽

Weakly Supervised ◽

Voice Activity

Download Full-text

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

10.21437/interspeech.2017-1469 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yun Wang ◽

Florian Metze

Keyword(s):

Transfer Learning ◽

Event Detection ◽

Sound Event ◽

Feature Extractor ◽

Sound Event Detection ◽

Connectionist Temporal Classification

Download Full-text

An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection

10.21437/interspeech.2020-2329 ◽

2020 ◽

Author(s):

Xu Zheng ◽

Yan Song ◽

Jie Yan ◽

Li-Rong Dai ◽

Ian McLoughlin ◽

...

Keyword(s):

Supervised Learning ◽

Event Detection ◽

Learning Method ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Neural Network Distillation on IoT Platforms for Sound Event Detection

10.21437/interspeech.2019-2394 ◽

2019 ◽

Cited By ~ 3

Author(s):

Gianmarco Cerutti ◽

Rahul Prasad ◽

Alessio Brutti ◽

Elisabetta Farella

Keyword(s):

Neural Network ◽

Event Detection ◽

Sound Event ◽

Iot Platforms ◽

Sound Event Detection

Download Full-text

Self-Training for Sound Event Detection in Audio Mixtures

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414450 ◽

2021 ◽

Author(s):

Sangwook Park ◽

Ashwin Bellur ◽

David K. Han ◽

Mounya Elhilali

Keyword(s):

Event Detection ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Sound Event Detection Based on Curriculum Learning Considering Learning Difficulty of Events

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414184 ◽

2021 ◽

Author(s):

Noriyuki Tonami ◽

Keisuke Imoto ◽

Yuki Okamoto ◽

Takahiro Fukumori ◽

Yoichi Yamashita

Keyword(s):

Event Detection ◽

Learning Difficulty ◽

Sound Event ◽

Sound Event Detection

Download Full-text

Sound Event Detection by Consistency Training and Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9414350 ◽

2021 ◽

Author(s):

Chih-Yuan Koh ◽

You-Siang Chen ◽

Yi-Wen Liu ◽

Mingsian R. Bai

Keyword(s):

Neural Networks ◽

Event Detection ◽

Recurrent Neural Networks ◽

Sound Event ◽

Feature Pyramid ◽

Sound Event Detection

Download Full-text

Sound Event Detection in Underground Parking Garage Using Convolutional Neural Network

Big Data and Cognitive Computing ◽

10.3390/bdcc4030020 ◽

2020 ◽

Vol 4 (3) ◽

pp. 20 ◽

Cited By ~ 1

Author(s):

Giuseppe Ciaburro

Keyword(s):

Event Detection ◽

High Accuracy ◽

Surveillance Systems ◽

Urban Mobility ◽

Parking Garage ◽

Large Size ◽

Car Crash ◽

Sound Event ◽

Parking Management ◽

Sound Event Detection

Parking is a crucial element in urban mobility management. The availability of parking areas makes it easier to use a service, determining its success. Proper parking management allows economic operators located nearby to increase their business revenue. Underground parking areas during off-peak hours are uncrowded places, where user safety is guaranteed by company overseers. Due to the large size, ensuring adequate surveillance would require many operators to increase the costs of parking fees. To reduce costs, video surveillance systems are used, in which an operator monitors many areas. However, some activities are beyond the control of this technology. In this work, a procedure to identify sound events in an underground garage is developed. The aim of the work is to detect sounds identifying dangerous situations and to activate an automatic alert that draws the attention of surveillance in that area. To do this, the sounds of a parking sector were detected with the use of sound sensors. These sounds were analyzed by a sound detector based on convolutional neural networks. The procedure returned high accuracy in identifying a car crash in an underground parking area.

Download Full-text

Polyphonic Sound Event Detection Based on CapsNet-RNN and Post Processing Optimization

10.1109/icisce50968.2020.00208 ◽

2020 ◽

Author(s):

Liujun zhang ◽

Liyan Luo ◽

Mei Wang ◽

Xiyu Song ◽

Shuting Guo ◽

...

Keyword(s):

Event Detection ◽

Post Processing ◽

Sound Event ◽

Sound Event Detection ◽

Processing Optimization

Download Full-text