Attacking Audio Event Detection Deep Learning Classifiers with White Noise

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.

Download Full-text

Disrupting Audio Event Detection Deep Neural Networks with White Noise

Technologies ◽

10.3390/technologies9030064 ◽

2021 ◽

Vol 9 (3) ◽

pp. 64

Author(s):

Rodrigo dos Santos ◽

Ashwitha Kassetty ◽

Shirin Nilizadeh

Keyword(s):

Neural Networks ◽

White Noise ◽

Convolutional Neural Networks ◽

Event Detection ◽

Recurrent Neural Networks ◽

Deep Neural Networks ◽

Audio Event ◽

Noise Disturbances ◽

Classification Tasks ◽

Percent Success

Audio event detection (AED) systems can leverage the power of specialized algorithms for detecting the presence of a specific sound of interest within audio captured from the environment. More recent approaches rely on deep learning algorithms, such as convolutional neural networks and convolutional recurrent neural networks. Given these conditions, it is important to assess how vulnerable these systems can be to attacks. As such, we develop AED-suited convolutional neural networks and convolutional recurrent neural networks, and attack them next with white noise disturbances, conceived to be simple and straightforward to be implemented and employed, even by non-tech savvy attackers. We develop this work under a safety-oriented scenario (AED systems for safety-related sounds, such as gunshots), and we show that an attacker can use such disturbances to avoid detection by up to 100 percent success. Prior work has shown that attackers can mislead image classification tasks; however, this work focuses on attacks against AED systems by tampering with their audio rather than image components. This work brings awareness to the designers and manufacturers of AED systems, as these solutions are vulnerable, yet may be trusted by individuals and families.

Download Full-text

On the Performance Improvements of Deep Learning Methods for Audio Event Detection and Classification

10.1109/tsp52935.2021.9522625 ◽

2021 ◽

Author(s):

Claudio Eutizi ◽

Francesco Benedetto

Keyword(s):

Deep Learning ◽

Event Detection ◽

Learning Methods ◽

Performance Improvements ◽

Audio Event

Download Full-text

Saliency detection in deep learning era: trends of development

Information and Control Systems ◽

10.31799/1684-8853-2019-3-10-36 ◽

2019 ◽

pp. 10-36 ◽

Cited By ~ 2

Author(s):

M. N. Favorskaya ◽

L. C. Jain

Keyword(s):

Deep Learning ◽

Object Detection ◽

Event Detection ◽

Visual Analysis ◽

Saliency Detection ◽

Salient Object Detection ◽

Public Image ◽

Detection Methods ◽

Salient Object ◽

Salient Event

Introduction:Saliency detection is a fundamental task of computer vision. Its ultimate aim is to localize the objects of interest that grab human visual attention with respect to the rest of the image. A great variety of saliency models based on different approaches was developed since 1990s. In recent years, the saliency detection has become one of actively studied topic in the theory of Convolutional Neural Network (CNN). Many original decisions using CNNs were proposed for salient object detection and, even, event detection.Purpose:A detailed survey of saliency detection methods in deep learning era allows to understand the current possibilities of CNN approach for visual analysis conducted by the human eyes’ tracking and digital image processing.Results:A survey reflects the recent advances in saliency detection using CNNs. Different models available in literature, such as static and dynamic 2D CNNs for salient object detection and 3D CNNs for salient event detection are discussed in the chronological order. It is worth noting that automatic salient event detection in durable videos became possible using the recently appeared 3D CNN combining with 2D CNN for salient audio detection. Also in this article, we have presented a short description of public image and video datasets with annotated salient objects or events, as well as the often used metrics for the results’ evaluation.Practical relevance:This survey is considered as a contribution in the study of rapidly developed deep learning methods with respect to the saliency detection in the images and videos.

Download Full-text

Time Aggregation Operators for Multi-label Audio Event Detection

10.21437/interspeech.2018-1637 ◽

2018 ◽

Author(s):

Pankaj Joshi ◽

Digvijaysingh Gautam ◽

Ganesh Ramakrishnan ◽

Preethi Jyothi

Keyword(s):

Event Detection ◽

Aggregation Operators ◽

Time Aggregation ◽

Audio Event

Download Full-text

A SVM-Based Audio Event Detection System

2010 International Conference on Electrical and Control Engineering ◽

10.1109/icece.2010.78 ◽

2010 ◽

Cited By ~ 7

Author(s):

Li Lu ◽

Fengpei Ge ◽

Qingwei Zhao ◽

Yonghong Yan

Keyword(s):

Event Detection ◽

Detection System ◽

Audio Event

Download Full-text

Analysis of Edge-Optimized Deep Learning Classifiers for Radar-based Gesture Recognition

IEEE Access ◽

10.1109/access.2021.3081353 ◽

2021 ◽

pp. 1-1

Author(s):

Mateusz Chmurski ◽

Mariusz Zubert ◽

Kay Bierzynski ◽

Avik Santra

Keyword(s):

Deep Learning ◽

Gesture Recognition ◽

Learning Classifiers

Download Full-text

A semiautomatic annotation approach for sentiment analysis

Journal of Information Science ◽

10.1177/01655515211006594 ◽

2021 ◽

pp. 016555152110065

Author(s):

Rahma Alahmary ◽

Hmood Al-Dossari

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Support Vector ◽

Short Term ◽

Term Memory ◽

Annotation Process ◽

Learning Classifiers ◽

Long Short Term Memory

Sentiment analysis (SA) aims to extract users’ opinions automatically from their posts and comments. Almost all prior works have used machine learning algorithms. Recently, SA research has shown promising performance in using the deep learning approach. However, deep learning is greedy and requires large datasets to learn, so it takes more time for data annotation. In this research, we proposed a semiautomatic approach using Naïve Bayes (NB) to annotate a new dataset in order to reduce the human effort and time spent on the annotation process. We created a dataset for the purpose of training and testing the classifier by collecting Saudi dialect tweets. The dataset produced from the semiautomatic model was then used to train and test deep learning classifiers to perform Saudi dialect SA. The accuracy achieved by the NB classifier was 83%. The trained semiautomatic model was used to annotate the new dataset before it was fed into the deep learning classifiers. The three deep learning classifiers tested in this research were convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (Bi-LSTM). Support vector machine (SVM) was used as the baseline for comparison. Overall, the performance of the deep learning classifiers exceeded that of SVM. The results showed that CNN reported the highest performance. On one hand, the performance of Bi-LSTM was higher than that of LSTM and SVM, and, on the other hand, the performance of LSTM was higher than that of SVM. The proposed semiautomatic annotation approach is usable and promising to increase speed and save time and effort in the annotation process.

Download Full-text