scholarly journals Automatic Environmental Sound Recognition (AESR) Using Convolutional Neural Network

Author(s):  
Md. Rayhan Ahmed ◽  
◽  
Towhidul Islam Robin ◽  
Ashfaq Ali Shafin
Author(s):  
Ke Zhang ◽  
Yu Su ◽  
Jingyu Wang ◽  
Sanyu Wang ◽  
Yanhua Zhang

At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters:the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.


Neural Networks (ANN) has evolved through many stages in the last three decades with many researchers contributing in this challenging field. With the power of math complex problems can also be solved by ANNs. ANNs like Convolutional Neural Network (CNN), Deep Neural network, Generative Adversarial Network (GAN), Long Short Term Memory (LSTM) network, Recurrent Neural Network (RNN), Ordinary Differential Network etc., are playing promising roles in many MNCs and IT industries for their predictions and accuracy. In this paper, Convolutional Neural Network is used for prediction of Beep sounds in high noise levels. Based on Supervised Learning, the research is developed the best CNN architecture for Beep sound recognition in noisy situations. The proposed method gives better results with an accuracy of 96%. The prototype is tested with few architectures for the training and test data out of which a two layer CNN classifier predictions were the best.


2021 ◽  
Author(s):  
Wenjun Yang

This thesis explores features characterizing the temporal dynamics and the use of ensemble techniques to improve the performances of environmental sound recognition (ESR) system. Firstly, for acoustic scene classification (ASC), local binary pattern (LBP) technique is applied to extract the temporal evolution of Mel-frequency cepstral coefficients (MFCC) features, and the D3C ensemble classifier is adopted to optimize the system performance. The results show that the proposed method achieved a classification improvement of 8% compared to the baseline system. Secondly, a new approach for sound event detection (SED) using Nonnegative Matrix Factor 2- D Deconvolution (NMF2D) and RUSBoost techniques is presented. The idea is to capture the two dimensional joint spectral and temporal information from the time-frequency representation (TFR) while possibly separating the sound mixture into several sources. Besides, the RUSBoost ensemble technique is utilized in the event detection process to alleviate class imbalance in the training data. This method reduced the total error rate by 5% compared to the baseline method.


Electronics ◽  
2021 ◽  
Vol 10 (21) ◽  
pp. 2622
Author(s):  
Jurgen Vandendriessche ◽  
Nick Wouters ◽  
Bruno da Silva ◽  
Mimoun Lamrini ◽  
Mohamed Yassin Chkouri ◽  
...  

In recent years, Environmental Sound Recognition (ESR) has become a relevant capability for urban monitoring applications. The techniques for automated sound recognition often rely on machine learning approaches, which have increased in complexity in order to achieve higher accuracy. Nonetheless, such machine learning techniques often have to be deployed on resource and power-constrained embedded devices, which has become a challenge with the adoption of deep learning approaches based on Convolutional Neural Networks (CNNs). Field-Programmable Gate Arrays (FPGAs) are power efficient and highly suitable for computationally intensive algorithms like CNNs. By fully exploiting their parallel nature, they have the potential to accelerate the inference time as compared to other embedded devices. Similarly, dedicated architectures to accelerate Artificial Intelligence (AI) such as Tensor Processing Units (TPUs) promise to deliver high accuracy while achieving high performance. In this work, we evaluate existing tool flows to deploy CNN models on FPGAs as well as on TPU platforms. We propose and adjust several CNN-based sound classifiers to be embedded on such hardware accelerators. The results demonstrate the maturity of the existing tools and how FPGAs can be exploited to outperform TPUs.


Sign in / Sign up

Export Citation Format

Share Document