Sound event detection in real life recordings using coupled matrix factorization of spectral representations and class activity annotations

Author(s):  
Annamaria Mesaros ◽  
Toni Heittola ◽  
Onur Dikmen ◽  
Tuomas Virtanen
2021 ◽  
Vol 11 (3) ◽  
pp. 1040
Author(s):  
Seokjin Lee ◽  
Minhan Kim ◽  
Seunghyeon Shin ◽  
Sooyoung Park ◽  
Youngho Jeong

In this paper, feature extraction methods are developed based on the non-negative matrix factorization (NMF) algorithm to be applied in weakly supervised sound event detection. Recently, the development of various features and systems have been attempted to tackle the problems of acoustic scene classification and sound event detection. However, most of these systems use data-independent spectral features, e.g., Mel-spectrogram, log-Mel-spectrum, and gammatone filterbank. Some data-dependent feature extraction methods, including the NMF-based methods, recently demonstrated the potential to tackle the problems mentioned above for long-term acoustic signals. In this paper, we further develop the recently proposed NMF-based feature extraction method to enable its application in weakly supervised sound event detection. To achieve this goal, we develop a strategy for training the frequency basis matrix using a heterogeneous database consisting of strongly- and weakly-labeled data. Moreover, we develop a non-iterative version of the NMF-based feature extraction method so that the proposed feature extraction method can be applied as a part of the model structure similar to the modern “on-the-fly” transform method for the Mel-spectrogram. To detect the sound events, the temporal basis is calculated using the NMF method and then used as a feature for the mean-teacher-model-based classifier. The results are improved for the event-wise post-processing method. To evaluate the proposed system, simulations of the weakly supervised sound event detection were conducted using the Detection and Classification of Acoustic Scenes and Events 2020 Task 4 database. The results reveal that the proposed system has F1-score performance comparable with the Mel-spectrogram and gammatonegram and exhibits 3–5% better performance than the log-Mel-spectrum and constant-Q transform.


Author(s):  
Manh-Quan Bui ◽  
Viet-Hang Duong ◽  
Seksan Mathulaprangsan ◽  
Bach-Tung Pham ◽  
Wei-Jing Lee ◽  
...  

Author(s):  
Yingwei Fu ◽  
Kele Xu ◽  
Haibo Mi ◽  
Huaimin Wang ◽  
Dezhi Wang ◽  
...  

Sound event detection is intended to analyze and recognize the sound events in audio streams and it has widespread applications in real life. Recently, deep neural networks such as convolutional recurrent neural networks have shown state-of-the-art performance in this task. However, the previous methods were designed and implemented on devices with rich computing resources, and there are few applications on mobile devices. This paper focuses on the solution on the mobile platform for sound event detection. The architecture of the solution includes offline training and online detection. During offline training process, multi model-based distillation method is used to compress model to enable real-time detection. The online detection process includes acquisition of sensor data, processing of audio signals, and detecting and recording of sound events. Finally, we implement an application on the mobile device that can detect sound events in near real time.


Sign in / Sign up

Export Citation Format

Share Document