Over-Parameterization and Generalization in Audio Classification

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.

Download Full-text

Deep Scattering Spectra with Deep Neural Networks for Acoustic Scene Classification Tasks

Chinese Journal of Electronics ◽

10.1049/cje.2019.07.006 ◽

2019 ◽

Vol 28 (6) ◽

pp. 1177-1183

Author(s):

Pengyuan Zhang ◽

Hangting Chen ◽

Haichuan Bai ◽

Qingsheng Yuan

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Scene Classification ◽

Scattering Spectra ◽

Classification Tasks

Download Full-text

NICE: Noise Injection and Clamping Estimation for Neural Network Quantization

Mathematics ◽

10.3390/math9172144 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2144

Author(s):

Chaim Baskin ◽

Evgenii Zheltonozhkii ◽

Tal Rozen ◽

Natan Liss ◽

Yoav Chai ◽

...

Keyword(s):

Neural Networks ◽

Power Systems ◽

Low Power ◽

Real Time ◽

Language Processing ◽

Power Efficient ◽

Noise Injection ◽

Classification Tasks ◽

Real Time Applications ◽

The Cost

Convolutional Neural Networks (CNNs) are very popular in many fields including computer vision, speech recognition, natural language processing, etc. Though deep learning leads to groundbreaking performance in those domains, the networks used are very computationally demanding and are far from being able to perform in real-time applications even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error unless spatial adjustments are carried out. The method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with as low as 3 bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low-power real-time applications. The quantization code will become publicly available upon acceptance.

Download Full-text

TRS: Transformers for Remote Sensing Scene Classification

Remote Sensing ◽

10.3390/rs13204143 ◽

2021 ◽

Vol 13 (20) ◽

pp. 4143

Author(s):

Jianrong Zhang ◽

Hongwei Zhao ◽

Jiao Li

Keyword(s):

Remote Sensing ◽

Language Processing ◽

State Of The Art ◽

Representation Learning ◽

Learning Performance ◽

Great Success ◽

Scene Classification ◽

Linear Classifier ◽

Classification Tasks ◽

Multiple Patches

Remote sensing scene classification remains challenging due to the complexity and variety of scenes. With the development of attention-based methods, Convolutional Neural Networks (CNNs) have achieved competitive performance in remote sensing scene classification tasks. As an important method of the attention-based model, the Transformer has achieved great success in the field of natural language processing. Recently, the Transformer has been used for computer vision tasks. However, most existing methods divide the original image into multiple patches and encode the patches as the input of the Transformer, which limits the model’s ability to learn the overall features of the image. In this paper, we propose a new remote sensing scene classification method, Remote Sensing Transformer (TRS), a powerful “pure CNNs→Convolution + Transformer → pure Transformers” structure. First, we integrate self-attention into ResNet in a novel way, using our proposed Multi-Head Self-Attention layer instead of 3 × 3 spatial revolutions in the bottleneck. Then we connect multiple pure Transformer encoders to further improve the representation learning performance completely depending on attention. Finally, we use a linear classifier for classification. We train our model on four public remote sensing scene datasets: UC-Merced, AID, NWPU-RESISC45, and OPTIMAL-31. The experimental results show that TRS exceeds the state-of-the-art methods and achieves higher accuracy.

Download Full-text

Representation of Words in Natural Language Processing: A Survey

Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics ◽

10.17721/1812-5409.2019/2.10 ◽

2019 ◽

pp. 82-87

Author(s):

Y. Losieva

Keyword(s):

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Computational Linguistics ◽

Language Processing ◽

Word Embeddings ◽

Vector Representation ◽

Advantages And Disadvantages ◽

Word Representation ◽

The Relationship

The article is devoted to research to the state-of-art vector representation of words in natural language processing. Three main types of vector representation of a word are described, namely: static word embeddings, use of deep neural networks for word representation and dynamic) word embeddings based on the context of the text. This is a very actual and much-demanded area in natural language processing, computational linguistics and artificial intelligence at all. Proposed to consider several different models for vector representation of the word (or word embeddings), from the simplest (as a representation of text that describes the occurrence of words within a document or learning the relationship between a pair of words) to the multilayered neural networks and deep bidirectional transformers for language understanding, are described chronologically in relation to the appearance of models. Improvements regarding previous models are described, both the advantages and disadvantages of the presented models and in which cases or tasks it is better to use one or another model.

Download Full-text

Compression of Convolutional Neural Network for Natural Language Processing

Computer Science ◽

10.7494/csci.2020.21.1.3375 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Krzysztof Wróbel ◽

Michał Karwatowski ◽

Maciej Wielgosz ◽

Marcin Pietroń ◽

Kazimierz Wiatr

Keyword(s):

Neural Network ◽

Artificial Intelligence ◽

Neural Networks ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Original Model ◽

Floating Point ◽

Memory Footprint ◽

Classification Tasks

Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were applied to other domains, including Natural Language Processing (NLP). Nowadays, the solutions based on artificial intelligence appear on mobile devices and in embedded systems, which places constraints on, among others, the memory and power consumption. Due to CNNs memory and computing requirements, to map them to hardware they need to be compressed.This paper presents the results of compression of the efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to FPGA and the results of this implementation are described. The conducted simulations showed that 5-bit width is enough to ensure no drop in accuracy when compared to the floating point version of the network. Additionally, the memory footprint was significantly reduced (between 85% and 93% comparing to the original model).

Download Full-text

From statistical inference to a differential learning rule for stochastic neural networks

Interface Focus ◽

10.1098/rsfs.2018.0033 ◽

2018 ◽

Vol 8 (6) ◽

pp. 20180033 ◽

Cited By ~ 2

Author(s):

Luca Saglietti ◽

Federica Gerace ◽

Alessandro Ingrosso ◽

Carlo Baldassi ◽

Riccardo Zecchina

Keyword(s):

Neural Networks ◽

Learning Rule ◽

Generative Models ◽

Stochastic Neural Networks ◽

Differential Learning ◽

Correlated Patterns ◽

Classification Tasks ◽

Hidden Neurons ◽

External Stimuli ◽

The Relationship

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our delayed-correlations matching (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale’s principle and asymmetry of synaptic connections, locality of the weight update computations. Nevertheless, the DCM rule is capable of storing a large, extensive number of patterns as attractors in a stochastic recurrent neural network, under general scenarios without requiring any modification: it can deal with correlated patterns, a broad range of architectures (with or without hidden neuronal states), one-shot learning with the palimpsest property, all the while avoiding the proliferation of spurious attractors. When hidden units are present, our learning rule can be employed to construct Boltzmann machine-like generative models, exploiting the addition of hidden neurons in feature extraction and classification tasks.

Download Full-text

Analyzing the Potential of Pre-Trained Embeddings for Audio Classification Tasks

2020 28th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco47968.2020.9287743 ◽

2021 ◽

Author(s):

Sascha Grollmisch ◽

Estefania Cano ◽

Christian Kehling ◽

Michael Taenzer

Keyword(s):

Audio Classification ◽

Classification Tasks

Download Full-text

Ozone Concentration Prediction using Artificial Neural Networks

Revista de Chimie ◽

10.37358/rc.17.10.5860 ◽

2017 ◽

Vol 68 (10) ◽

pp. 2224-2227 ◽

Cited By ~ 2

Author(s):

Camelia Gavrila

Keyword(s):

Neural Networks ◽

Artificial Neural Networks ◽

Ozone Concentration ◽

Meteorological Data ◽

Global Solar Radiation ◽

Time Data ◽

Marquardt Algorithm ◽

Sensitivity Tests ◽

Artificial Neural ◽

The Relationship

The aim of this paper is to determine a mathematical model which establishes the relationship between ozone levels together with other meteorological data and air quality. The model is valid for any season and for any area and is based on real-time data measured in Bucharest and its surroundings. This study is based on research using artificial neural networks to model nonlinear relationships between the concentration of immission of ozone and the meteorological factors: relative humidity (RH), global solar radiation (SR), air temperature (TEMP). The ozone concentration depends on following primary pollutants: nitrogen oxides (NO, NO2), carbon monoxide (CO). To achieve this, the Levenberg-Marquardt algorithm was implemented in Scilab, a numerical computation software. Performed sensitivity tests proved the robustness of the model and its applicability in predicting the ozone on short-term.

Download Full-text