scholarly journals Over-Parameterization and Generalization in Audio Classification

2021 ◽  
Author(s):  
Khaled Koutini ◽  
Hamid Eghbal-zadeh ◽  
Florian Henkel ◽  
Jan Schlüter ◽  
Gerhard Widmer

Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene classification (DCASE) community. In this study, we investigate the relationship between over-parameterization of acoustic scene classification models, and their resulting generalization abilities. Our results indicate that increasing width improves generalization to unseen devices, even without an increase in the number of parameters.

2019 ◽  
Vol 28 (6) ◽  
pp. 1177-1183
Author(s):  
Pengyuan Zhang ◽  
Hangting Chen ◽  
Haichuan Bai ◽  
Qingsheng Yuan

Mathematics ◽  
2021 ◽  
Vol 9 (17) ◽  
pp. 2144
Author(s):  
Chaim Baskin ◽  
Evgenii Zheltonozhkii ◽  
Tal Rozen ◽  
Natan Liss ◽  
Yoav Chai ◽  
...  

Convolutional Neural Networks (CNNs) are very popular in many fields including computer vision, speech recognition, natural language processing, etc. Though deep learning leads to groundbreaking performance in those domains, the networks used are very computationally demanding and are far from being able to perform in real-time applications even on a GPU, which is not power efficient and therefore does not suit low power systems such as mobile devices. To overcome this challenge, some solutions have been proposed for quantizing the weights and activations of these networks, which accelerate the runtime significantly. Yet, this acceleration comes at the cost of a larger error unless spatial adjustments are carried out. The method proposed in this work trains quantized neural networks by noise injection and a learned clamping, which improve accuracy. This leads to state-of-the-art results on various regression and classification tasks, e.g., ImageNet classification with architectures such as ResNet-18/34/50 with as low as 3 bit weights and activations. We implement the proposed solution on an FPGA to demonstrate its applicability for low-power real-time applications. The quantization code will become publicly available upon acceptance.


2021 ◽  
Vol 13 (20) ◽  
pp. 4143
Author(s):  
Jianrong Zhang ◽  
Hongwei Zhao ◽  
Jiao Li

Remote sensing scene classification remains challenging due to the complexity and variety of scenes. With the development of attention-based methods, Convolutional Neural Networks (CNNs) have achieved competitive performance in remote sensing scene classification tasks. As an important method of the attention-based model, the Transformer has achieved great success in the field of natural language processing. Recently, the Transformer has been used for computer vision tasks. However, most existing methods divide the original image into multiple patches and encode the patches as the input of the Transformer, which limits the model’s ability to learn the overall features of the image. In this paper, we propose a new remote sensing scene classification method, Remote Sensing Transformer (TRS), a powerful “pure CNNs→Convolution + Transformer → pure Transformers” structure. First, we integrate self-attention into ResNet in a novel way, using our proposed Multi-Head Self-Attention layer instead of 3 × 3 spatial revolutions in the bottleneck. Then we connect multiple pure Transformer encoders to further improve the representation learning performance completely depending on attention. Finally, we use a linear classifier for classification. We train our model on four public remote sensing scene datasets: UC-Merced, AID, NWPU-RESISC45, and OPTIMAL-31. The experimental results show that TRS exceeds the state-of-the-art methods and achieves higher accuracy.


Author(s):  
Y. Losieva

The article is devoted to research to the state-of-art vector representation of words in natural language processing. Three main types of vector representation of a word are described, namely: static word embeddings, use of deep neural networks for word representation and dynamic) word embeddings based on the context of the text. This is a very actual and much-demanded area in natural language processing, computational linguistics and artificial intelligence at all. Proposed to consider several different models for vector representation of the word (or word embeddings), from the simplest (as a representation of text that describes the occurrence of words within a document or learning the relationship between a pair of words) to the multilayered neural networks and deep bidirectional transformers for language understanding, are described chronologically in relation to the appearance of models. Improvements regarding previous models are described, both the advantages and disadvantages of the presented models and in which cases or tasks it is better to use one or another model.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Krzysztof Wróbel ◽  
Michał Karwatowski ◽  
Maciej Wielgosz ◽  
Marcin Pietroń ◽  
Kazimierz Wiatr

Convolutional Neural Networks (CNNs) were created for image classification tasks. Quickly, they were applied to other domains, including Natural Language Processing (NLP). Nowadays, the solutions based on artificial intelligence appear on mobile devices and in embedded systems, which places constraints on, among others, the memory and power consumption. Due to CNNs memory and computing requirements, to map them to hardware they need to be compressed.This paper presents the results of compression of the efficient CNNs for sentiment analysis. The main steps involve pruning and quantization. The process of mapping the compressed network to FPGA and the results of this implementation are described. The conducted simulations showed that 5-bit width is enough to ensure no drop in accuracy when compared to the floating point version of the network. Additionally, the memory footprint was significantly reduced (between 85% and 93% comparing to the original model).


2018 ◽  
Vol 8 (6) ◽  
pp. 20180033 ◽  
Author(s):  
Luca Saglietti ◽  
Federica Gerace ◽  
Alessandro Ingrosso ◽  
Carlo Baldassi ◽  
Riccardo Zecchina

Stochastic neural networks are a prototypical computational device able to build a probabilistic representation of an ensemble of external stimuli. Building on the relationship between inference and learning, we derive a synaptic plasticity rule that relies only on delayed activity correlations, and that shows a number of remarkable features. Our delayed-correlations matching (DCM) rule satisfies some basic requirements for biological feasibility: finite and noisy afferent signals, Dale’s principle and asymmetry of synaptic connections, locality of the weight update computations. Nevertheless, the DCM rule is capable of storing a large, extensive number of patterns as attractors in a stochastic recurrent neural network, under general scenarios without requiring any modification: it can deal with correlated patterns, a broad range of architectures (with or without hidden neuronal states), one-shot learning with the palimpsest property, all the while avoiding the proliferation of spurious attractors. When hidden units are present, our learning rule can be employed to construct Boltzmann machine-like generative models, exploiting the addition of hidden neurons in feature extraction and classification tasks.


2017 ◽  
Vol 68 (10) ◽  
pp. 2224-2227 ◽  
Author(s):  
Camelia Gavrila

The aim of this paper is to determine a mathematical model which establishes the relationship between ozone levels together with other meteorological data and air quality. The model is valid for any season and for any area and is based on real-time data measured in Bucharest and its surroundings. This study is based on research using artificial neural networks to model nonlinear relationships between the concentration of immission of ozone and the meteorological factors: relative humidity (RH), global solar radiation (SR), air temperature (TEMP). The ozone concentration depends on following primary pollutants: nitrogen oxides (NO, NO2), carbon monoxide (CO). To achieve this, the Levenberg-Marquardt algorithm was implemented in Scilab, a numerical computation software. Performed sensitivity tests proved the robustness of the model and its applicability in predicting the ozone on short-term.


Sign in / Sign up

Export Citation Format

Share Document