Environmental Sound Classification via Time-Frequency Attention and Framewise Self-Attention based Deep Neural Networks

At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters:the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.

Download Full-text

Rethinking environmental sound classification using convolutional neural networks: optimized parameter tuning of single feature extraction

Neural Computing and Applications ◽

10.1007/s00521-021-06091-7 ◽

2021 ◽

Author(s):

Yousef Abd Al-Hattab ◽

Hasan Firdaus Zaki ◽

Amir Akramin Shafie

Keyword(s):

Neural Networks ◽

Feature Extraction ◽

Convolutional Neural Networks ◽

Parameter Tuning ◽

Environmental Sound ◽

Sound Classification ◽

Single Feature

Download Full-text

A Method of Environmental Sound Classification Based on Residual Networks and Data Augmentation

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500188 ◽

2021 ◽

pp. 2150018

Author(s):

Jinfang Zeng ◽

Youming Li ◽

Yu Zhang ◽

Da Chen

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

Machine Learning Techniques ◽

Deep Convolutional Neural Networks ◽

Mel Frequency Cepstral Coefficients ◽

Environmental Sound ◽

Sound Classification ◽

Learning Techniques ◽

Proposed Model ◽

Audio Data

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. To date, a variety of signal processing and machine learning techniques have been applied to ESC task, including matrix factorization, dictionary learning, wavelet filterbanks and deep neural networks. It is observed that features extracted from deeper networks tend to achieve higher performance than those extracted from shallow networks. However, in ESC task, only the deep convolutional neural networks (CNNs) which contain several layers are used and the residual networks are ignored, which lead to degradation in the performance. Meanwhile, a possible explanation for the limited exploration of CNNs and the difficulty to improve on simpler models is the relative scarcity of labeled data for ESC. In this paper, a residual network called EnvResNet for the ESC task is proposed. In addition, we propose to use audio data augmentation to overcome the problem of data scarcity. The experiments will be performed on the ESC-50 database. Combined with data augmentation, the proposed model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.

Download Full-text

A pattern recognition system for environmental sound classification based on MFCCs and neural networks

10.1109/icspcs.2008.4813723 ◽

2008 ◽

Cited By ~ 9

Author(s):

F. Beritelli ◽

R. Grasso

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Recognition System ◽

Environmental Sound ◽

Pattern Recognition System ◽

Sound Classification

Download Full-text

Masked Conditional Neural Networks for Environmental Sound Classification

Artificial Intelligence XXXIV - Lecture Notes in Computer Science ◽

10.1007/978-3-319-71078-5_2 ◽

2017 ◽

pp. 21-33 ◽

Cited By ~ 3

Author(s):

Fady Medhat ◽

David Chesmore ◽

John Robinson

Keyword(s):

Neural Networks ◽

Environmental Sound ◽

Sound Classification

Download Full-text

Howling Noise Cancellation in Time–Frequency Domain by Deep Neural Networks

10.1007/978-981-16-2380-6_28 ◽

2021 ◽

pp. 319-332

Author(s):

Huaguo Gan ◽

Gaoyong Luo ◽

Yaqing Luo ◽

Wenbin Luo

Keyword(s):

Neural Networks ◽

Frequency Domain ◽

Deep Neural Networks ◽

Noise Cancellation ◽

Time Frequency

Download Full-text

Environmental Sound Classification Using Deep Convolutional Neural Networks and Data Augmentation

2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS) ◽

10.1109/raics.2018.8635051 ◽

2018 ◽

Cited By ~ 7

Author(s):

Nithya Davis ◽

K Suresh

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Data Augmentation ◽

Deep Convolutional Neural Networks ◽

Environmental Sound ◽

Sound Classification

Download Full-text

Automated Recognition of Arrhythmia Using Deep Neural Networks for 12-Lead Electrocardiograms with Fractional Time–Frequency Domain Extension

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.3212 ◽

2020 ◽

Vol 10 (11) ◽

pp. 2764-2767

Author(s):

Chuanbin Ge ◽

Di Liu ◽

Juan Liu ◽

Bingshuai Liu ◽

Yi Xin

Keyword(s):

Neural Networks ◽

Frequency Domain ◽

Deep Neural Networks ◽

Fractional Fourier Transform ◽

Training Dataset ◽

Clinical Tool ◽

Time Frequency ◽

Ecg Signals ◽

Physiological Signal ◽

Ecg Data

Arrhythmia is a group of conditions in which the heartbeat is irregular. There are many types of arrhythmia. Some can be life-threatening. Electrocardiogram (ECG) is an effective clinical tool used to diagnosis arrhythmia. Automatic recognition of different arrhythmia types in ECG signals has become an important and challenging issue. In this article, we proposed an algorithm to detect arrhythmia in 12-lead ECG signals and classify signals into 9 categories. Two 19-layer deep neural networks combining convolutional neural network and gated recurrent unit were proposed to realize this work. The first one was trained directly with the raw 12-lead ECG data while the other one was trained with an 18-"lead" ECG data, where the six extra leads containing morphology information in fractional time–frequency domain were generated utilizing fractional Fourier transform (FRFT). Overall detection results were obtained by fusing the output of these two networks and the final classification results on the testing dataset reports our proposed algorithm obtained a F1 score of 0.855. Furthermore, with our proposed algorithm, a better F1 score 0.81 was attained using training dataset provided by the China Physiological Signal Challenge held in 2018.

Download Full-text