scholarly journals Semi-Supervised Learning with Data Augmentation for End-to-End ASR

Author(s):  
Felix Weninger ◽  
Franco Mana ◽  
Roberto Gemello ◽  
Jesús Andrés-Ferrer ◽  
Puming Zhan
Author(s):  
Vladislav Neskorniuk ◽  
Pedro J. Freire ◽  
Antonio Napoli ◽  
Bernhard Spinnler ◽  
Wolfgang Schairer ◽  
...  

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Huu-Thanh Duong ◽  
Tram-Anh Nguyen-Thi

AbstractIn literature, the machine learning-based studies of sentiment analysis are usually supervised learning which must have pre-labeled datasets to be large enough in certain domains. Obviously, this task is tedious, expensive and time-consuming to build, and hard to handle unseen data. This paper has approached semi-supervised learning for Vietnamese sentiment analysis which has limited datasets. We have summarized many preprocessing techniques which were performed to clean and normalize data, negation handling, intensification handling to improve the performances. Moreover, data augmentation techniques, which generate new data from the original data to enrich training data without user intervention, have also been presented. In experiments, we have performed various aspects and obtained competitive results which may motivate the next propositions.


2021 ◽  
Vol 150 (5) ◽  
pp. 3914-3928
Author(s):  
J. A. Castro-Correa ◽  
M. Badiey ◽  
T. B. Neilsen ◽  
D. P. Knobles ◽  
W. S. Hodgkiss

Symmetry ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 1393
Author(s):  
Dongju Park ◽  
Chang Wook Ahn

In this paper, we propose a novel data augmentation method with respect to the target context of the data via self-supervised learning. Instead of looking for the exact synonyms of masked words, the proposed method finds words that can replace the original words considering the context. For self-supervised learning, we can employ the masked language model (MLM), which masks a specific word within a sentence and obtains the original word. The MLM learns the context of a sentence through asymmetrical inputs and outputs. However, without using the existing MLM, we propose a label-masked language model (LMLM) that can include label information for the mask tokens used in the MLM to effectively use the MLM in data with label information. The augmentation method performs self-supervised learning using LMLM and then implements data augmentation through the trained model. We demonstrate that our proposed method improves the classification accuracy of recurrent neural networks and convolutional neural network-based classifiers through several experiments for text classification benchmark datasets, including the Stanford Sentiment Treebank-5 (SST5), the Stanford Sentiment Treebank-2 (SST2), the subjectivity (Subj), the Multi-Perspective Question Answering (MPQA), the Movie Reviews (MR), and the Text Retrieval Conference (TREC) datasets. In addition, since the proposed method does not use external data, it can eliminate the time spent collecting external data, or pre-training using external data.


2021 ◽  
Author(s):  
Sayan Nag

Self-supervised learning and pre-training strategies have developed over the last few years especially for Convolutional Neural Networks (CNNs). Recently application of such methods can also be noticed for Graph Neural Networks (GNNs). In this paper, we have used a graph based self-supervised learning strategy with different loss functions (Barlow Twins[? ], HSIC[? ], VICReg[? ]) which have shown promising results when applied with CNNs previously. We have also proposed a hybrid loss function combining the advantages of VICReg and HSIC and called it as VICRegHSIC. The performance of these aforementioned methods have been compared when applied to two different datasets namely MUTAG and PROTEINS. Moreover, the impact of different batch sizes, projector dimensions and data augmentation strategies have also been explored. The results are preliminary and we will be continuing to explore with other datasets.


2021 ◽  
Author(s):  
Jianwei Sun ◽  
Zhiyuan Tang ◽  
Hengxin Yin ◽  
Wei Wang ◽  
Xi Zhao ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document