A Low-Compexity Deep Learning FrameworkFor Acoustic Scene Classification

Mapping Intimacies ◽

10.31219/osf.io/cmgws ◽

2021 ◽

Author(s):

Lam Pham ◽

Hieu Tang ◽

Anahid Jalal ◽

Alexander Schindler ◽

Ross King

Keyword(s):

Audio Signal ◽

Low Complexity ◽

Scene Classification ◽

Late Fusion ◽

Classification Result ◽

Urban Scenes ◽

Model Compression ◽

Front End ◽

Learning Frameworks

In this paper, we presents a low-complexitydeep learning frameworks for acoustic scene classification(ASC). The proposed framework can be separated into threemain steps: Front-end spectrogram extraction, back-endclassification, and late fusion of predicted probabilities.First, we use Mel filter, Gammatone filter and ConstantQ Transfrom (CQT) to transform raw audio signal intospectrograms, where both frequency and temporal featuresare presented. Three spectrograms are then fed into threeindividual back-end convolutional neural networks (CNNs),classifying into ten urban scenes. Finally, a late fusion ofthree predicted probabilities obtained from three CNNs isconducted to achieve the final classification result. To reducethe complexity of our proposed CNN network, we applytwo model compression techniques: model restriction anddecomposed convolution. Our extensive experiments, whichare conducted on DCASE 2021 (IEEE AASP Challenge onDetection and Classification of Acoustic Scenes and Events)Task 1A development dataset, achieve a low-complexity CNNbased framework with 128 KB trainable parameters andthe best classification accuracy of 66.7%, improving DCASEbaseline by 19.0%.

Download Full-text

CNN-Based Acoustic Scene Classification System

Electronics ◽

10.3390/electronics10040371 ◽

2021 ◽

Vol 10 (4) ◽

pp. 371

Author(s):

Yerin Lee ◽

Soyoung Lim ◽

Il-Youp Kwak

Keyword(s):

Low Complexity ◽

Classification Model ◽

Weight Class ◽

Scene Classification ◽

Audio File ◽

General Classification ◽

Team Task ◽

Real World Applications ◽

Activation Mapping

Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). This presents the solution to Task 1 of the DCASE 2020 challenge submitted by the Chung-Ang University team. Task 1 addressed two challenges that ASC faces in real-world applications. One is that the audio recorded using different recording devices should be classified in general, and the other is that the model used should have low-complexity. We proposed two models to overcome the aforementioned problems. First, a more general classification model was proposed by combining the harmonic-percussive source separation (HPSS) and deltas-deltadeltas features with four different models. Second, using the same feature, depthwise separable convolution was applied to the Convolutional layer to develop a low-complexity model. Moreover, using gradient-weight class activation mapping (Grad-CAM), we investigated what part of the feature our model sees and identifies. Our proposed system ranked 9th and 7th in the competition for these two subtasks, respectively.

Download Full-text

Deep Learning Frameworks Applied For Audio-Visual Scene Classification

10.31219/osf.io/6hxrq ◽

2021 ◽

Author(s):

Lam Pham ◽

Alexander Schindler ◽

Mina Schutz ◽

Jasmin Lampert ◽

Sven Schlarb ◽

...

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Visual Input ◽

Visual Scene ◽

Scene Classification ◽

Audio Features ◽

Learning Frameworks

In this paper, we present deep learning frameworks for audio-visual scene classification (SC) and indicate how individual visual and audio features as well as their combination affect SC performance.Our extensive experiments, which are conducted on DCASE (IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events) Task 1B development dataset, achieve the best classification accuracy of 82.2\%, 91.1\%, and 93.9\% with audio input only, visual input only, and both audio-visual input, respectively.The highest classification accuracy of 93.9\%, obtained from an ensemble of audio-based and visual-based frameworks, shows an improvement of 16.5\% compared with DCASE baseline.

Download Full-text

Multi-classification of audio signal based on modified SVM

IET International Communication Conference on Wireless Mobile & Computing (CCWMC 2009) ◽

10.1049/cp.2009.1958 ◽

2009 ◽

Author(s):

Junwei Liu ◽

Xiaoqing Yu ◽

Wanggen Wan ◽

Changlian Li

Keyword(s):

Audio Signal ◽

Multi Classification

Download Full-text

Scene classification of ambiguous visual information

5th International Conference on Visual Information Engineering (VIE 2008) ◽

10.1049/cp:20080403 ◽

2008 ◽

Author(s):

L. Dong ◽

E. Izquierdo

Keyword(s):

Visual Information ◽

Scene Classification

Download Full-text

Low Distortion Analog Front-End for Digital Electret Microphone

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.475-476.1633 ◽

2013 ◽

Vol 475-476 ◽

pp. 1633-1637

Author(s):

Seung Yong Bae ◽

Jong Do Lee ◽

Eun Ju Choe ◽

Gil Cho Ahn

Keyword(s):

Audio Signal ◽

Cmos Technology ◽

Common Mode ◽

Electret Microphone ◽

Front End ◽

Voltage Variation ◽

Analog Front End ◽

Source Follower ◽

Source Voltage ◽

Low Distortion

This paper presents a low distortion analog front-end (AFE) circuit to process electret microphone output signal. A source follower is employed for the input buffer to interface electret microphone directly to the IC with level shifting. A single-ended to differential converter with output common-mode control is presented to compensate the common-mode variation resulted from gate to source voltage variation in the source follower. A replica stage is adopted to control the output bias voltage of the single-ended to differential converter. The prototype AFE circuit fabricated in a 0.35μm CMOS technology achieves 68.2dB peak SNDR and 79.9dB SFDR over an audio signal bandwidth of 20kHz with 2.5V supply while consuming 1.05mW.

Download Full-text

Semantic Classification of Heterogeneous Urban Scenes Using Intrascene Feature Similarity and Interscene Semantic Dependency

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2015.2414178 ◽

2015 ◽

Vol 8 (5) ◽

pp. 2005-2014 ◽

Cited By ~ 30

Author(s):

Xiuyuan Zhang ◽

Shihong Du ◽

Yi-Chen Wang

Keyword(s):

Semantic Classification ◽

Urban Scenes ◽

Feature Similarity ◽

Semantic Dependency

Download Full-text

Low-Complexity Acoustic Scene Classification Using Data Generation Based On Primary Ambient Extraction

10.1109/bmsb53066.2021.9547178 ◽

2021 ◽

Author(s):

Chuang Shi ◽

Haocong Yang ◽

Yingzi Liu ◽

Jiangnan Liang

Keyword(s):

Low Complexity ◽

Data Generation ◽

Scene Classification ◽

Using Data

Download Full-text

A new front-end for classification of non-speech sounds: a study on human whistle

10.21437/interspeech.2015-436 ◽

2015 ◽

Author(s):

Mahesh Kumar Nandwana ◽

Hynek Bořil ◽

John H. L. Hansen

Keyword(s):

Speech Sounds ◽

Front End

Download Full-text

Complex Scene Classification of PoLSAR Imagery Based on a Self-Paced Learning Approach

IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing ◽

10.1109/jstars.2018.2879440 ◽

2018 ◽

Vol 11 (12) ◽

pp. 4818-4825 ◽

Cited By ~ 1

Author(s):

Wenshuai Chen ◽

Shuiping Gou ◽

Xinlin Wang ◽

Licheng Jiao ◽

Changzhe Jiao ◽

...

Keyword(s):

Learning Approach ◽

Scene Classification ◽

Complex Scene

Download Full-text

An Analysis of State-of-the-art Activation Functions For Supervised Deep Neural Network

10.31219/osf.io/2zk6a ◽

2021 ◽

Author(s):

Anh Nguyen ◽

Khoa Pham ◽

Dat Ngo ◽

Thanh Ngo ◽

Lam Pham

Keyword(s):

Neural Network ◽

Supervised Classification ◽

Deep Neural Network ◽

State Of The Art ◽

Network Architectures ◽

Activation Functions ◽

Scene Classification ◽

Learning Network ◽

Deep Learning Network

This paper provides an analysis of state-of-the-art activation functions with respect to supervised classification of deep neural network. These activation functions comprise of Rectified Linear Units (ReLU), Exponential Linear Unit (ELU), Scaled Exponential Linear Unit (SELU), Gaussian Error Linear Unit (GELU), and the Inverse Square Root Linear Unit (ISRLU). To evaluate, experiments over two deep learning network architectures integrating these activation functions are conducted. The first model, basing on Multilayer Perceptron (MLP), is evaluated with MNIST dataset to perform these activation functions.Meanwhile, the second model, likely VGGish-based architecture, is applied for Acoustic Scene Classification (ASC) Task 1A in DCASE 2018 challenge, thus evaluate whether these activation functions work well in different datasets as well as different network architectures.

Download Full-text