scholarly journals Robust Acoustic Scene Classification in thePresence of Active Foreground Speech

2021 ◽  
Author(s):  
Siyuan Song ◽  
Brecht Desplanques ◽  
Celest De Moor ◽  
Kris Demuynck ◽  
Nilesh Madhu

We present an iVector based Acoustic Scene Clas-sification (ASC) system suited for real life settings where activeforeground speech can be present. In the proposed system, eachrecording is represented by a fixed-length iVector that modelsthe recording’s important properties. A regularized Gaussianbackend classifier with class-specific covariance models is usedto extract the relevant acoustic scene information from theseiVectors. To alleviate the large performance degradation when aforeground speaker dominates the captured signal, we investigatethe use of the iVector framework on Mel-Frequency CepstralCoefficients (MFCCs) that are derived from an estimate of thenoise power spectral density. This noise-floor can be extracted in astatistical manner for single channel recordings. We show that theuse of noise-floor features is complementary to multi-conditiontraining in which foreground speech is added to training signalto reduce the mismatch between training and testing conditions.Experimental results on the DCASE 2016 Task 1 dataset showthat the noise-floor based features and multi-condition trainingrealize significant classification accuracy gains of up to more than25 percentage points (absolute) in the most adverse conditions.These promising results can further facilitate the integration ofASC in resource-constrained devices such as hearables.

Informatica ◽  
2017 ◽  
Vol 28 (1) ◽  
pp. 193-214 ◽  
Author(s):  
Tung-Tso Tsai ◽  
Sen-Shan Huang ◽  
Yuh-Min Tseng

Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4496
Author(s):  
Vlad Pandelea ◽  
Edoardo Ragusa ◽  
Tommaso Apicella ◽  
Paolo Gastaldo ◽  
Erik Cambria

Emotion recognition, among other natural language processing tasks, has greatly benefited from the use of large transformer models. Deploying these models on resource-constrained devices, however, is a major challenge due to their computational cost. In this paper, we show that the combination of large transformers, as high-quality feature extractors, and simple hardware-friendly classifiers based on linear separators can achieve competitive performance while allowing real-time inference and fast training. Various solutions including batch and Online Sequential Learning are analyzed. Additionally, our experiments show that latency and performance can be further improved via dimensionality reduction and pre-training, respectively. The resulting system is implemented on two types of edge device, namely an edge accelerator and two smartphones.


Information ◽  
2021 ◽  
Vol 12 (7) ◽  
pp. 264
Author(s):  
Jinghan Wang ◽  
Guangyue Li ◽  
Wenzhao Zhang

The powerful performance of deep learning is evident to all. With the deepening of research, neural networks have become more complex and not easily generalized to resource-constrained devices. The emergence of a series of model compression algorithms makes artificial intelligence on edge possible. Among them, structured model pruning is widely utilized because of its versatility. Structured pruning prunes the neural network itself and discards some relatively unimportant structures to compress the model’s size. However, in the previous pruning work, problems such as evaluation errors of networks, empirical determination of pruning rate, and low retraining efficiency remain. Therefore, we propose an accurate, objective, and efficient pruning algorithm—Combine-Net, introducing Adaptive BN to eliminate evaluation errors, the Kneedle algorithm to determine the pruning rate objectively, and knowledge distillation to improve the efficiency of retraining. Results show that, without precision loss, Combine-Net achieves 95% parameter compression and 83% computation compression on VGG16 on CIFAR10, 71% of parameter compression and 41% computation compression on ResNet50 on CIFAR100. Experiments on different datasets and models have proved that Combine-Net can efficiently compress the neural network’s parameters and computation.


2021 ◽  
Vol 5 (4) ◽  
pp. 1-28
Author(s):  
Chia-Heng Tu ◽  
Qihui Sun ◽  
Hsiao-Hsuan Chang

Monitoring environmental conditions is an important application of cyber-physical systems. Typically, the monitoring is to perceive surrounding environments with battery-powered, tiny devices deployed in the field. While deep learning-based methods, especially the convolutional neural networks (CNNs), are promising approaches to enriching the functionalities offered by the tiny devices, they demand more computation and memory resources, which makes these methods difficult to be adopted on such devices. In this article, we develop a software framework, RAP , that permits the construction of the CNN designs by aggregating the existing, lightweight CNN layers, which are able to fit in the limited memory (e.g., several KBs of SRAM) on the resource-constrained devices satisfying application-specific timing constrains. RAP leverages the Python-based neural network framework Chainer to build the CNNs by mounting the C/C++ implementations of the lightweight layers, trains the built CNN models as the ordinary model-training procedure in Chainer, and generates the C version codes of the trained models. The generated programs are compiled into target machine executables for the on-device inferences. With the vigorous development of lightweight CNNs, such as binarized neural networks with binary weights and activations, RAP facilitates the model building process for the resource-constrained devices by allowing them to alter, debug, and evaluate the CNN designs over the C/C++ implementation of the lightweight CNN layers. We have prototyped the RAP framework and built two environmental monitoring applications for protecting endangered species using image- and acoustic-based monitoring methods. Our results show that the built model consumes less than 0.5 KB of SRAM for buffering the runtime data required by the model inference while achieving up to 93% of accuracy for the acoustic monitoring with less than one second of inference time on the TI 16-bit microcontroller platform.


Sign in / Sign up

Export Citation Format

Share Document