scholarly journals Deep Learning for Audio Event Detection and Tagging on Low-Resource Datasets

2018 ◽  
Vol 8 (8) ◽  
pp. 1397 ◽  
Author(s):  
Veronica Morfi ◽  
Dan Stowell

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.

Author(s):  
Veronica Morfi ◽  
Dan Stowell

In training a deep learning system to perform audio transcription, two practical problems may arise. Firstly, most datasets are weakly labelled, having only a list of events present in each recording without any temporal information for training. Secondly, deep neural networks need a very large amount of labelled training data to achieve good quality performance, yet in practice it is difficult to collect enough samples for most classes of interest. In this paper, we propose factorising the final task of audio transcription into multiple intermediate tasks in order to improve the training performance when dealing with this kind of low-resource datasets. We evaluate three data-efficient approaches of training a stacked convolutional and recurrent neural network for the intermediate tasks. Our results show that different methods of training have different advantages and disadvantages.


2021 ◽  
Vol 15 (5) ◽  
pp. 669-677
Author(s):  
Harumo Sasatake ◽  
Ryosuke Tasaki ◽  
Takahito Yamashita ◽  
Naoki Uchiyama ◽  
◽  
...  

Population aging has become a major problem in developed countries. As the labor force declines, robot arms are expected to replace human labor for simple tasks. A robotic arm attaches a tool specialized for a task and acquires the movement through teaching by an engineer with expert knowledge. However, the number of such engineers is limited; therefore, a teaching method that can be used by non-technical personnel is necessitated. As a teaching method, deep learning can be used to imitate human behavior and tool usage. However, deep learning requires a large amount of training data for learning. In this study, the target task of the robot is to sweep multiple pieces of dirt using a broom. The proposed learning system can estimate the initial parameters for deep learning based on experience, as well as the shape and physical properties of the tools. It can reduce the number of training data points when learning a new tool. A virtual reality system is used to move the robot arm easily and safely, as well as to create training data for imitation. In this study, cleaning experiments are conducted to evaluate the effectiveness of the proposed method. The experimental results confirm that the proposed method can accelerate the learning speed of deep learning and acquire cleaning ability using a small amount of training data.


Heart ◽  
2018 ◽  
Vol 104 (23) ◽  
pp. 1921-1928 ◽  
Author(s):  
Ming-Zher Poh ◽  
Yukkee Cheung Poh ◽  
Pak-Hei Chan ◽  
Chun-Ka Wong ◽  
Louise Pun ◽  
...  

ObjectiveTo evaluate the diagnostic performance of a deep learning system for automated detection of atrial fibrillation (AF) in photoplethysmographic (PPG) pulse waveforms.MethodsWe trained a deep convolutional neural network (DCNN) to detect AF in 17 s PPG waveforms using a training data set of 149 048 PPG waveforms constructed from several publicly available PPG databases. The DCNN was validated using an independent test data set of 3039 smartphone-acquired PPG waveforms from adults at high risk of AF at a general outpatient clinic against ECG tracings reviewed by two cardiologists. Six established AF detectors based on handcrafted features were evaluated on the same test data set for performance comparison.ResultsIn the validation data set (3039 PPG waveforms) consisting of three sequential PPG waveforms from 1013 participants (mean (SD) age, 68.4 (12.2) years; 46.8% men), the prevalence of AF was 2.8%. The area under the receiver operating characteristic curve (AUC) of the DCNN for AF detection was 0.997 (95% CI 0.996 to 0.999) and was significantly higher than all the other AF detectors (AUC range: 0.924–0.985). The sensitivity of the DCNN was 95.2% (95% CI 88.3% to 98.7%), specificity was 99.0% (95% CI 98.6% to 99.3%), positive predictive value (PPV) was 72.7% (95% CI 65.1% to 79.3%) and negative predictive value (NPV) was 99.9% (95% CI 99.7% to 100%) using a single 17 s PPG waveform. Using the three sequential PPG waveforms in combination (<1 min in total), the sensitivity was 100.0% (95% CI 87.7% to 100%), specificity was 99.6% (95% CI 99.0% to 99.9%), PPV was 87.5% (95% CI 72.5% to 94.9%) and NPV was 100% (95% CI 99.4% to 100%).ConclusionsIn this evaluation of PPG waveforms from adults screened for AF in a real-world primary care setting, the DCNN had high sensitivity, specificity, PPV and NPV for detecting AF, outperforming other state-of-the-art methods based on handcrafted features.


Sensors ◽  
2021 ◽  
Vol 21 (22) ◽  
pp. 7527
Author(s):  
Mugdim Bublin

Distributed Acoustic Sensing (DAS) is a promising new technology for pipeline monitoring and protection. However, a big challenge is distinguishing between relevant events, like intrusion by an excavator near the pipeline, and interference, like land machines. This paper investigates whether it is possible to achieve adequate detection accuracy with classic machine learning algorithms using simulations and real system implementation. Then, we compare classical machine learning with a deep learning approach and analyze the advantages and disadvantages of both approaches. Although acceptable performance can be achieved with both approaches, preliminary results show that deep learning is the more promising approach, eliminating the need for laborious feature extraction and offering a six times lower event detection delay and twelve times lower execution time. However, we achieved the best results by combining deep learning with the knowledge-based and classical machine learning approaches. At the end of this manuscript, we propose general guidelines for efficient system design combining knowledge-based, classical machine learning, and deep learning approaches.


Author(s):  
Shao-Yen Tseng ◽  
Juncheng Li ◽  
Yun Wang ◽  
Florian Metze ◽  
Joseph Szurley ◽  
...  

2021 ◽  
Author(s):  
Mohsin Y Ahmed ◽  
Li Zhu ◽  
Md Mahbubur Rahman ◽  
Tousif Ahmed ◽  
Jilong Kuang ◽  
...  

Author(s):  
Dirk Siegmund ◽  
Biying Fu ◽  
Adán José-García ◽  
Ahmad Salahuddin ◽  
Arjan Kuijper

Due to the deforming and dynamically changing textile fibers, the quality assurance of cleaned industrial textiles is still a mostly manual task. Usually, textiles need to be spread flat, in order to detect defects using computer vision inspection methods. Already known methods for detecting defects on such inhomogeneous, voluminous surfaces use mainly supervised methods based on deep neural networks and require lots of labeled training data. In contrast, we present a novel unsupervised method, based on SURF keypoints, that does not require any training data. We propose using their location, number and orientation in order to group them into geographically close clusters. Keypoint clusters also indicate the exact position of the defect at the same time. We furthermore compared our approach to supervised methods using deep learning. The presented processing pipeline shows how normalization and classification methods need to be combined, in order to reliably detect fiber defects such as cuts and holes. We evaluate the performance of our system in real-world settings with images of piles of textiles, taken in stereo vision. Our results show that our novel unsupervised classification method using keypoint clustering achieves comparable results to other supervised methods.


2019 ◽  
Vol 0 (9/2019) ◽  
pp. 13-18
Author(s):  
Karol Antczak

The paper discusses regularization properties of artificial data for deep learning. Artificial datasets allow to train neural networks in the case of a real data shortage. It is demonstrated that the artificial data generation process, described as injecting noise to high-level features, bears several similarities to existing regularization methods for deep neural networks. One can treat this property of artificial data as a kind of “deep” regularization. It is thus possible to regularize hidden layers of the network by generating the training data in a certain way.


Sign in / Sign up

Export Citation Format

Share Document