scholarly journals A Low-Compexity Deep Learning FrameworkFor Acoustic Scene Classification

2021 ◽  
Author(s):  
Lam Pham ◽  
Hieu Tang ◽  
Anahid Jalal ◽  
Alexander Schindler ◽  
Ross King

In this paper, we presents a low-complexitydeep learning frameworks for acoustic scene classification(ASC). The proposed framework can be separated into threemain steps: Front-end spectrogram extraction, back-endclassification, and late fusion of predicted probabilities.First, we use Mel filter, Gammatone filter and ConstantQ Transfrom (CQT) to transform raw audio signal intospectrograms, where both frequency and temporal featuresare presented. Three spectrograms are then fed into threeindividual back-end convolutional neural networks (CNNs),classifying into ten urban scenes. Finally, a late fusion ofthree predicted probabilities obtained from three CNNs isconducted to achieve the final classification result. To reducethe complexity of our proposed CNN network, we applytwo model compression techniques: model restriction anddecomposed convolution. Our extensive experiments, whichare conducted on DCASE 2021 (IEEE AASP Challenge onDetection and Classification of Acoustic Scenes and Events)Task 1A development dataset, achieve a low-complexity CNNbased framework with 128 KB trainable parameters andthe best classification accuracy of 66.7%, improving DCASEbaseline by 19.0%.

Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 371
Author(s):  
Yerin Lee ◽  
Soyoung Lim ◽  
Il-Youp Kwak

Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). This presents the solution to Task 1 of the DCASE 2020 challenge submitted by the Chung-Ang University team. Task 1 addressed two challenges that ASC faces in real-world applications. One is that the audio recorded using different recording devices should be classified in general, and the other is that the model used should have low-complexity. We proposed two models to overcome the aforementioned problems. First, a more general classification model was proposed by combining the harmonic-percussive source separation (HPSS) and deltas-deltadeltas features with four different models. Second, using the same feature, depthwise separable convolution was applied to the Convolutional layer to develop a low-complexity model. Moreover, using gradient-weight class activation mapping (Grad-CAM), we investigated what part of the feature our model sees and identifies. Our proposed system ranked 9th and 7th in the competition for these two subtasks, respectively.


2021 ◽  
Author(s):  
Lam Pham ◽  
Alexander Schindler ◽  
Mina Schutz ◽  
Jasmin Lampert ◽  
Sven Schlarb ◽  
...  

In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance.Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2\%, 91.1\%, and 93.9\% with audio input only, visual input only, and both audio-visual input, respectively.The highest classification accuracy of 93.9\%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5\% compared with DCASE baseline.


2013 ◽  
Vol 475-476 ◽  
pp. 1633-1637
Author(s):  
Seung Yong Bae ◽  
Jong Do Lee ◽  
Eun Ju Choe ◽  
Gil Cho Ahn

This paper presents a low distortion analog front-end (AFE) circuit to process electret microphone output signal. A source follower is employed for the input buffer to interface electret microphone directly to the IC with level shifting. A single-ended to differential converter with output common-mode control is presented to compensate the common-mode variation resulted from gate to source voltage variation in the source follower. A replica stage is adopted to control the output bias voltage of the single-ended to differential converter. The prototype AFE circuit fabricated in a 0.35μm CMOS technology achieves 68.2dB peak SNDR and 79.9dB SFDR over an audio signal bandwidth of 20kHz with 2.5V supply while consuming 1.05mW.


2015 ◽  
Author(s):  
Mahesh Kumar Nandwana ◽  
Hynek Bořil ◽  
John H. L. Hansen
Keyword(s):  

Author(s):  
Wenshuai Chen ◽  
Shuiping Gou ◽  
Xinlin Wang ◽  
Licheng Jiao ◽  
Changzhe Jiao ◽  
...  

2021 ◽  
Author(s):  
Anh Nguyen ◽  
Khoa Pham ◽  
Dat Ngo ◽  
Thanh Ngo ◽  
Lam Pham

This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two deep learning network architectures integrating these activation functions are conducted. The first model, basing on Multilayer Perceptron (MLP), is evaluated with MNIST dataset to perform these activation functions.Meanwhile, the second model, likely VGGish-based architecture, is applied for Acoustic Scene Classification (ASC) Task 1A in DCASE 2018 challenge, thus evaluate whether these activation functions work well in different datasets as well as different network architectures.


Sign in / Sign up

Export Citation Format

Share Document