Robust acoustic event recognition using AVMD-PWVD time-frequency image

Acoustic scene analysis (ASA) relies on the dynamic sensing and understanding of stationary and non-stationary sounds from various events, background noises and human actions with objects. However, the spatio-temporal nature of the sound signals may not be stationary, and novel events may exist that eventually deteriorate the performance of the analysis. In this study, a self-learning-based ASA for acoustic event recognition (AER) is presented to detect and incrementally learn novel acoustic events by tackling catastrophic forgetting. The proposed ASA framework comprises six elements: (1) raw acoustic signal pre-processing, (2) low-level and deep audio feature extraction, (3) acoustic novelty detection (AND), (4) acoustic signal augmentations, (5) incremental class-learning (ICL) (of the audio features of the novel events) and (6) AER. The self-learning on different types of audio features extracted from the acoustic signals of various events occurs without human supervision. For the extraction of deep audio representations, in addition to visual geometry group (VGG) and residual neural network (ResNet), time-delay neural network (TDNN) and TDNN based long short-term memory (TDNN–LSTM) networks are pre-trained using a large-scale audio dataset, Google AudioSet. The performances of ICL with AND using Mel-spectrograms, and deep features with TDNNs, VGG, and ResNet from the Mel-spectrograms are validated on benchmark audio datasets such as ESC-10, ESC-50, UrbanSound8K (US8K), and an audio dataset collected by the authors in a real domestic environment.

Download Full-text

Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition

10.21437/interspeech.2018-1243 ◽

2018 ◽

Cited By ~ 2

Author(s):

Liwen Zhang ◽

Jiqing Han ◽

Shiwen Deng

Keyword(s):

Sparse Coding ◽

Feature Learning ◽

Event Recognition ◽

Acoustic Event ◽

Temporal Feature

Download Full-text

Deep Neural Network Bottleneck Features for Acoustic Event Recognition

10.21437/interspeech.2016-1112 ◽

2016 ◽

Cited By ~ 8

Author(s):

Seongkyu Mun ◽

Suwon Shon ◽

Wooil Kim ◽

Hanseok Ko

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Event Recognition ◽

Acoustic Event

Download Full-text

Acoustic event recognition using cochleagram image and convolutional neural networks

Applied Acoustics ◽

10.1016/j.apacoust.2018.12.006 ◽

2019 ◽

Vol 148 ◽

pp. 62-66 ◽

Cited By ~ 13

Author(s):

Roneel V. Sharan ◽

Tom J. Moir

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Event Recognition ◽

Acoustic Event

Download Full-text

Pseudo-color cochleagram image feature and sequential feature selection for robust acoustic event recognition

Applied Acoustics ◽

10.1016/j.apacoust.2018.05.030 ◽

2018 ◽

Vol 140 ◽

pp. 198-204 ◽

Cited By ~ 5

Author(s):

Roneel V. Sharan ◽

Tom J. Moir

Keyword(s):

Feature Selection ◽

Event Recognition ◽

Image Feature ◽

Acoustic Event ◽

Sequential Feature Selection ◽

Selection For ◽

Pseudo Color

Download Full-text

Real-time On-edge Classification: an Application to Domestic Acoustic Event Recognition

10.14428/esann/2021.es2021-84 ◽

2021 ◽

Author(s):

Lode Vuegen ◽

Peter Karsmakers

Keyword(s):

Real Time ◽

Event Recognition ◽

Acoustic Event ◽

Edge Classification

Download Full-text

Multi-Scale Time-Frequency Attention for Acoustic Event Detection

10.21437/interspeech.2019-1587 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jingyang Zhang ◽

Wenhao Ding ◽

Jintao Kang ◽

Liang He

Keyword(s):

Event Detection ◽

Acoustic Event Detection ◽

Time Frequency ◽

Acoustic Event ◽

Multi Scale

Download Full-text

Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Recognition

10.21437/interspeech.2016-805 ◽

2016 ◽

Cited By ~ 29

Author(s):

Naoya Takahashi ◽

Michael Gygli ◽

Beat Pfister ◽

Luc Van Gool

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

Event Recognition ◽

Deep Convolutional Neural Networks ◽

Acoustic Event

Download Full-text

Convolutional recurrent neural networks with multi-sized convolution filters for sound-event recognition

Modern Physics Letters B ◽

10.1142/s0217984920502358 ◽

2020 ◽

Vol 34 (23) ◽

pp. 2050235

Author(s):

Feizhen Huang ◽

Jinfang Zeng ◽

Yu Zhang ◽

Wentao Xu

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Recognition Rate ◽

Event Recognition ◽

Signal To Noise ◽

Time Frequency ◽

Sound Event ◽

Input Layer ◽

Convolution Filter ◽

Convolution Filters

Sound-event recognition often utilizes time-frequency analysis to produce an image-like spectrogram that provides a rich visual representation of original signal in time and frequency. Convolutional Neural Networks (CNN) with the ability of learning discriminative spectrogram patterns are suitable for sound-event recognition. However, there is relatively little effort that CNN makes full use of the important temporal information. In this paper, we propose MCRNN, a Convolutional Recurrent Neural Networks (CRNN) architecture for sound-event recognition, the letter “M” in the name “MCRNN” of our model denotes the multi-sized convolution filters. Richer features are extracted by using several different convolution filter sizes at the last convolution layer. In addition, cochleagram images are used as the input layer of the network, instead of the traditional spectrogram image of a sound signal. Experiments on the RWCP dataset shows that the recognition rate of the proposed method achieved 98.4% in clean conditions, and it robustly outperforms the existing methods, the recognition rate increased by 0.9%, 1.9% and 10.3% in 20 dB, 10 dB and 0 dB signal-to-noise ratios (SNR), respectively.

Download Full-text