weakly labeled data
Recently Published Documents


TOTAL DOCUMENTS

42
(FIVE YEARS 23)

H-INDEX

7
(FIVE YEARS 3)

2021 ◽  
Vol 11 (18) ◽  
pp. 8581
Author(s):  
Yuzhuo Liu ◽  
Hangting Chen ◽  
Jian Wang ◽  
Pei Wang ◽  
Pengyuan Zhang

In recent years, the involvement of synthetic strongly labeled data, weakly labeled data, and unlabeled data has drawn much research attention in semi-supervised acoustic event detection (SAED). The classic self-training method carries out predictions for unlabeled data and then selects predictions with high probabilities as pseudo-labels for retraining. Such models have shown its effectiveness in SAED. However, probabilities are poorly calibrated confidence estimates, and samples with low probabilities are ignored. Hence, we introduce a confidence-based semi-supervised Acoustic event detection (C-SAED) framework. The C-SAED method learns confidence deliberately and retrains all data distinctly by applying confidence as weights. Additionally, we apply a power pooling function whose coefficient can be trained automatically and use weakly labeled data more efficiently. The experimental results demonstrate that the generated confidence is proportional to the accuracy of the predictions. Our C-SAED framework achieves a relative error rate reduction of 34% in contrast to the baseline model.


2021 ◽  
Author(s):  
Yuxing Tang ◽  
Zhenjie Cao ◽  
Yanbo Zhang ◽  
Zhicheng Yang ◽  
Zongcheng Ji ◽  
...  

Author(s):  
Sichen Liu ◽  
Feiran Yang ◽  
Yin Cao ◽  
Jun Yang

AbstractSound event detection (SED), which is typically treated as a supervised problem, aims at detecting types of sound events and corresponding temporal information. It requires to estimate onset and offset annotations for sound events at each frame. Many available sound event datasets only contain audio tags without precise temporal information. This type of dataset is therefore classified as weakly labeled dataset. In this paper, we propose a novel source separation-based method trained on weakly labeled data to solve SED problems. We build a dilated depthwise separable convolution block (DDC-block) to estimate time-frequency (T-F) masks of each sound event from a T-F representation of an audio clip. DDC-block is experimentally proven to be more effective and computationally lighter than “VGG-like” block. To fully utilize frequency characteristics of sound events, we then propose a frequency-dependent auto-pooling (FAP) function to obtain the clip-level present probability of each sound event class. A combination of two schemes, named DDC-FAP method, is evaluated on DCASE 2018 Task 2, DCASE 2020 Task4, and DCASE 2017 Task 4 datasets. The results show that DDC-FAP has a better performance than the state-of-the-art source separation-based method in SED task.


Technologies ◽  
2021 ◽  
Vol 9 (1) ◽  
pp. 10
Author(s):  
Debapriya Banerjee ◽  
Maria Kyrarini ◽  
Won Hwa Kim

Weakly labeled data are inevitable in various research areas in artificial intelligence (AI) where one has a modicum of knowledge about the complete dataset. One of the reasons for weakly labeled data in AI is insufficient accurately labeled data. Strict privacy control or accidental loss may also cause missing-data problems. However, supervised machine learning (ML) requires accurately labeled data in order to successfully solve a problem. Data labeling is difficult and time-consuming as it requires manual work, perfect results, and sometimes human experts to be involved (e.g., medical labeled data). In contrast, unlabeled data are inexpensive and easily available. Due to there not being enough labeled training data, researchers sometimes only obtain one or few data points per category or label. Training a supervised ML model from the small set of labeled data is a challenging task. The objective of this research is to recover missing labels from the dataset using state-of-the-art ML techniques using a semisupervised ML approach. In this work, a novel convolutional neural network-based framework is trained with a few instances of a class to perform metric learning. The dataset is then converted into a graph signal, which is recovered using a recover algorithm (RA) in graph Fourier transform. The proposed approach was evaluated on a Fashion dataset for accuracy and precision and performed significantly better than graph neural networks and other state-of-the-art methods.


2021 ◽  
Author(s):  
Haoming Jiang ◽  
Danqing Zhang ◽  
Tianyu Cao ◽  
Bing Yin ◽  
Tuo Zhao

Author(s):  
Haytham M. Fayek ◽  
Anurag Kumar

Recognizing sounds is a key aspect of computational audio scene analysis and machine perception. In this paper, we advocate that sound recognition is inherently a multi-modal audiovisual task in that it is easier to differentiate sounds using both the audio and visual modalities as opposed to one or the other. We present an audiovisual fusion model that learns to recognize sounds from weakly labeled video recordings. The proposed fusion model utilizes an attention mechanism to dynamically combine the outputs of the individual audio and visual models. Experiments on the large scale sound events dataset, AudioSet, demonstrate the efficacy of the proposed model, which outperforms the single-modal models, and state-of-the-art fusion and multi-modal models. We achieve a mean Average Precision (mAP) of 46.16 on Audioset, outperforming prior state of the art by approximately +4.35 mAP (relative: 10.4%).


Sign in / Sign up

Export Citation Format

Share Document