Semi-Supervised Learning with Data Augmentation for End-to-End ASR

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.

Download Full-text

A semi-supervised learning detection method for vision-based monitoring of construction sites by integrating teacher-student networks and data augmentation

Advanced Engineering Informatics ◽

10.1016/j.aei.2021.101372 ◽

2021 ◽

Vol 50 ◽

pp. 101372

Author(s):

Bo Xiao ◽

Yuxuan Zhang ◽

Yuan Chen ◽

Xianfei Yin

Keyword(s):

Supervised Learning ◽

Data Augmentation ◽

Detection Method ◽

Construction Sites ◽

Teacher Student

Download Full-text

Impact of data augmentation on supervised learning for a moving mid-frequency source

The Journal of the Acoustical Society of America ◽

10.1121/10.0007284 ◽

2021 ◽

Vol 150 (5) ◽

pp. 3914-3928

Author(s):

J. A. Castro-Correa ◽

M. Badiey ◽

T. B. Neilsen ◽

D. P. Knobles ◽

W. S. Hodgkiss

Keyword(s):

Supervised Learning ◽

Data Augmentation ◽

Frequency Source

Download Full-text

Self-Supervised Contextual Data Augmentation for Natural Language Processing

Symmetry ◽

10.3390/sym11111393 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1393

Author(s):

Dongju Park ◽

Chang Wook Ahn

Keyword(s):

Supervised Learning ◽

Language Processing ◽

Recurrent Neural Networks ◽

Question Answering ◽

Data Augmentation ◽

Language Model ◽

Contextual Data ◽

External Data ◽

Label Information ◽

Benchmark Datasets

In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.

Download Full-text

GrowingNet: An end-to-end growing network for semi-supervised learning

Computer Communications ◽

10.1016/j.comcom.2020.01.003 ◽

2020 ◽

Vol 151 ◽

pp. 208-215

Author(s):

Qifei Zhang ◽

Xiaomo Yu

Keyword(s):

Supervised Learning ◽

End To End ◽

Growing Network

Download Full-text

Graph Self Supervised Learning: the BT, the HSIC, and the VICReg

10.31219/osf.io/tvmdu ◽

2021 ◽

Author(s):

Sayan Nag

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Loss Function ◽

Data Augmentation ◽

Learning Strategy ◽

Loss Functions ◽

Augmentation Strategies ◽

Batch Sizes ◽

Graph Neural Networks ◽

The Impact

Self-supervised learning and pre-training strategies have developed over the last few years especially for Convolutional Neural Networks (CNNs). Recently application of such methods can also be noticed for Graph Neural Networks (GNNs). In this paper, we have used a graph based self-supervised learning strategy with different loss functions (Barlow Twins[? ], HSIC[? ], VICReg[? ]) which have shown promising results when applied with CNNs previously. We have also proposed a hybrid loss function combining the advantages of VICReg and HSIC and called it as VICRegHSIC. The performance of these aforementioned methods have been compared when applied to two different datasets namely MUTAG and PROTEINS. Moreover, the impact of different batch sizes, projector dimensions and data augmentation strategies have also been explored. The results are preliminary and we will be continuing to explore with other datasets.

Download Full-text