Learning Sound Events from Webly Labeled Data

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/384 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anurag Kumar ◽

Ankit Shah ◽

Alexander Hauptmann ◽

Bhiksha Raj

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Learning Process ◽

Event Detection ◽

Deep Neural Networks ◽

Baseline Method ◽

Audio Recordings ◽

Audio Data ◽

Audio Event ◽

The Web

In the last couple of years, weakly labeled learning has turned out to be an exciting approach for audio event detection. In this work, we introduce webly labeled learning for sound events which aims to remove human supervision altogether from the learning process. We first develop a method of obtaining labeled audio data from the web (albeit noisy), in which no manual labeling is involved. We then describe methods to efficiently learn from these webly labeled audio recordings. In our proposed system, WeblyNet, two deep neural networks co-teach each other to robustly learn from webly labeled data, leading to around 17% relative improvement over the baseline method. The method also involves transfer learning to obtain efficient representations.

Get full-text (via PubEx)

Device Invariant Deep Neural Networks for Pulmonary Audio Event Detection Across Mobile and Wearable Devices

10.1109/embc46164.2021.9629853 ◽

2021 ◽

Author(s):

Mohsin Y Ahmed ◽

Li Zhu ◽

Md Mahbubur Rahman ◽

Tousif Ahmed ◽

Jilong Kuang ◽

...

Keyword(s):

Neural Networks ◽

Event Detection ◽

Deep Neural Networks ◽

Wearable Devices ◽

Audio Event

Get full-text (via PubEx)

Sound Event Detection Using Derivative Features in Deep Neural Networks

Applied Sciences ◽

10.3390/app10144911 ◽

2020 ◽

Vol 10 (14) ◽

pp. 4911

Author(s):

Jin-Yeol Kwak ◽

Yong-Joo Chung

Keyword(s):

Neural Network ◽

Neural Networks ◽

Event Detection ◽

Network Architecture ◽

Deep Neural Networks ◽

Audio Signal ◽

Feed Forward Neural Network ◽

Sound Event ◽

Audio Data ◽

Sound Event Detection

We propose using derivative features for sound event detection based on deep neural networks. As input to the networks, we used log-mel-filterbank and its first and second derivative features for each frame of the audio signal. Two deep neural networks were used to evaluate the effectiveness of these derivative features. Specifically, a convolutional recurrent neural network (CRNN) was constructed by combining a convolutional neural network and a recurrent neural networks (RNN) followed by a feed-forward neural network (FNN) acting as a classification layer. In addition, a mean-teacher model based on an attention CRNN was used. Both models had an average pooling layer at the output so that weakly labeled and unlabeled audio data may be used during model training. Under the various training conditions, depending on the neural network architecture and training set, the use of derivative features resulted in a consistent performance improvement by using the derivative features. Experiments on audio data from the Detection and Classification of Acoustic Scenes and Events 2018 and 2019 challenges indicated that a maximum relative improvement of 16.9% was obtained in terms of the F-score.

Get full-text (via PubEx)

Disrupting Audio Event Detection Deep Neural Networks with White Noise

Technologies ◽

10.3390/technologies9030064 ◽

2021 ◽

Vol 9 (3) ◽

pp. 64

Author(s):

Rodrigo dos Santos ◽

Ashwitha Kassetty ◽

Shirin Nilizadeh

Keyword(s):

Neural Networks ◽

White Noise ◽

Convolutional Neural Networks ◽

Event Detection ◽

Recurrent Neural Networks ◽

Deep Neural Networks ◽

Audio Event ◽

Noise Disturbances ◽

Classification Tasks ◽

Percent Success

Audio event detection (AED) systems can leverage the power of specialized algorithms for detecting the presence of a specific sound of interest within audio captured from the environment. More recent approaches rely on deep learning algorithms, such as convolutional neural networks and convolutional recurrent neural networks. Given these conditions, it is important to assess how vulnerable these systems can be to attacks. As such, we develop AED-suited convolutional neural networks and convolutional recurrent neural networks, and attack them next with white noise disturbances, conceived to be simple and straightforward to be implemented and employed, even by non-tech savvy attackers. We develop this work under a safety-oriented scenario (AED systems for safety-related sounds, such as gunshots), and we show that an attacker can use such disturbances to avoid detection by up to 100 percent success. Prior work has shown that attackers can mislead image classification tasks; however, this work focuses on attacks against AED systems by tampering with their audio rather than image components. This work brings awareness to the designers and manufacturers of AED systems, as these solutions are vulnerable, yet may be trusted by individuals and families.

Get full-text (via PubEx)

Audio Event Detection Using Deep Neural Networks

Journal of Digital Contents Society ◽

10.9728/dcs.2017.18.1.183 ◽

2017 ◽

Vol 18 (1) ◽

pp. 183-190 ◽

Cited By ~ 1

Author(s):

Minkyu Lim ◽

Donghyun Lee ◽

Hosung Park ◽

Ji-Hwan Kim

Keyword(s):

Neural Networks ◽

Event Detection ◽

Deep Neural Networks ◽

Audio Event

Get full-text (via PubEx)

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Get full-text (via PubEx)

Rice Leaf Diseases Prediction using Deep Neural Networks with Transfer Learning

Environmental Research ◽

10.1016/j.envres.2021.111275 ◽

2021 ◽

pp. 111275

Author(s):

N. Krishnamoorthy ◽

LVNarasimha Prasad ◽

CSPavan Kumar ◽

Bharat Subedi ◽

Haftom Baraki Abraha ◽

...

Keyword(s):

Neural Networks ◽

Transfer Learning ◽

Deep Neural Networks ◽

Rice Leaf

Get full-text (via PubEx)

Representing Deep Neural Networks Latent Space Geometries with Graphs

Algorithms ◽

10.3390/a14020039 ◽

2021 ◽

Vol 14 (2) ◽

pp. 39

Author(s):

Carlos Lassance ◽

Vincent Gripon ◽

Antonio Ortega

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Objective Function ◽

Learning Process ◽

Deep Neural Networks ◽

State Of The Art ◽

The Core ◽

Learning Tasks ◽

Latent Space

Deep Learning (DL) has attracted a lot of attention for its ability to reach state-of-the-art performance in many machine learning tasks. The core principle of DL methods consists of training composite architectures in an end-to-end fashion, where inputs are associated with outputs trained to optimize an objective function. Because of their compositional nature, DL architectures naturally exhibit several intermediate representations of the inputs, which belong to so-called latent spaces. When treated individually, these intermediate representations are most of the time unconstrained during the learning process, as it is unclear which properties should be favored. However, when processing a batch of inputs concurrently, the corresponding set of intermediate representations exhibit relations (what we call a geometry) on which desired properties can be sought. In this work, we show that it is possible to introduce constraints on these latent geometries to address various problems. In more detail, we propose to represent geometries by constructing similarity graphs from the intermediate representations obtained when processing a batch of inputs. By constraining these Latent Geometry Graphs (LGGs), we address the three following problems: (i) reproducing the behavior of a teacher architecture is achieved by mimicking its geometry, (ii) designing efficient embeddings for classification is achieved by targeting specific geometries, and (iii) robustness to deviations on inputs is achieved via enforcing smooth variation of geometry between consecutive latent spaces. Using standard vision benchmarks, we demonstrate the ability of the proposed geometry-based methods in solving the considered problems.

Get full-text (via PubEx)

EMOTIONS RECOGNITION IN HUMAN SPEECH USING DEEP NEURAL NETWORKS

Vestnik komp iuternykh i informatsionnykh tekhnologii ◽

10.14489/vkit.2021.01.pp.044-051 ◽

2021 ◽

pp. 44-51

Author(s):

E. Yu. Shchetinin

Keyword(s):

Neural Network ◽

Machine Learning ◽

Neural Networks ◽

Convolutional Neural Network ◽

Recurrent Neural Network ◽

Deep Neural Networks ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Audio Recordings ◽

Computer Studies

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.

Get full-text (via PubEx)

Natural Images Allow Universal Adversarial Attacks on Medical Image Classification Using Deep Neural Networks with Transfer Learning

10.21203/rs.3.rs-757225/v1 ◽

2021 ◽

Author(s):

Akinori Minagi ◽

Hokuto Hirano ◽

Kazuhiro Takemoto

Keyword(s):

Neural Networks ◽

Image Classification ◽

Transfer Learning ◽

Medical Image ◽

Deep Neural Networks ◽

Disease Diagnosis ◽

Natural Images ◽

Fine Tuning ◽

Security And Privacy ◽

Medical Image Classification

Abstract Transfer learning from natural images is well used in deep neural networks (DNNs) for medical image classification to achieve computer-aided clinical diagnosis. Although the adversarial vulnerability of DNNs hinders practical applications owing to the high stakes of diagnosis, adversarial attacks are expected to be limited because training data — which are often required for adversarial attacks — are generally unavailable in terms of security and privacy preservation. Nevertheless, we hypothesized that adversarial attacks are also possible using natural images because pre-trained models do not change significantly after fine-tuning. We focused on three representative DNN-based medical image classification tasks (i.e., skin cancer, referable diabetic retinopathy, and pneumonia classifications) and investigated whether medical DNN models with transfer learning are vulnerable to universal adversarial perturbations (UAPs), generated using natural images. UAPs from natural images are useful for both non-targeted and targeted attacks. The performance of UAPs from natural images was significantly higher than that of random controls, although slightly lower than that of UAPs from training images. Vulnerability to UAPs from natural images was observed between different natural image datasets and between different model architectures. The use of transfer learning causes a security hole, which decreases the reliability and safety of computer-based disease diagnosis. Model training from random initialization (without transfer learning) reduced the performance of UAPs from natural images; however, it did not completely avoid vulnerability to UAPs. The vulnerability of UAPs from natural images will become a remarkable security threat.

Get full-text (via PubEx)

Protein Family-Specific Models Using Deep Neural Networks and Transfer Learning Improve Virtual Screening and Highlight the Need for More Data

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.8b00350 ◽

2018 ◽

Vol 58 (11) ◽

pp. 2319-2330 ◽

Cited By ~ 33

Author(s):

Fergus Imrie ◽

Anthony R. Bradley ◽

Mihaela van der Schaar ◽

Charlotte M. Deane

Keyword(s):

Neural Networks ◽

Virtual Screening ◽

Transfer Learning ◽

Deep Neural Networks ◽

Protein Family

Get full-text (via PubEx)