Adversarial training for few-shot text classification

2021 ◽  
Vol 14 (2) ◽  
pp. 201-214
Author(s):  
Danilo Croce ◽  
Giuseppe Castellucci ◽  
Roberto Basili

In recent years, Deep Learning methods have become very popular in classification tasks for Natural Language Processing (NLP); this is mainly due to their ability to reach high performances by relying on very simple input representations, i.e., raw tokens. One of the drawbacks of deep architectures is the large amount of annotated data required for an effective training. Usually, in Machine Learning this problem is mitigated by the usage of semi-supervised methods or, more recently, by using Transfer Learning, in the context of deep architectures. One recent promising method to enable semi-supervised learning in deep architectures has been formalized within Semi-Supervised Generative Adversarial Networks (SS-GANs) in the context of Computer Vision. In this paper, we adopt the SS-GAN framework to enable semi-supervised learning in the context of NLP. We demonstrate how an SS-GAN can boost the performances of simple architectures when operating in expressive low-dimensional embeddings; these are derived by combining the unsupervised approximation of linguistic Reproducing Kernel Hilbert Spaces and the so-called Universal Sentence Encoders. We experimentally evaluate the proposed approach over a semantic classification task, i.e., Question Classification, by considering different sizes of training material and different numbers of target classes. By applying such adversarial schema to a simple Multi-Layer Perceptron, a classifier trained over a subset derived from 1% of the original training material achieves 92% of accuracy. Moreover, when considering a complex classification schema, e.g., involving 50 classes, the proposed method outperforms state-of-the-art alternatives such as BERT.

Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Ashish Jaiswal ◽  
Ashwin Ramesh Babu ◽  
Mohammad Zaki Zadeh ◽  
Debapriya Banerjee ◽  
Fillia Makedon

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.


2020 ◽  
Vol 31 (1) ◽  
Author(s):  
Andreas Bittracher ◽  
Stefan Klus ◽  
Boumediene Hamzi ◽  
Péter Koltai ◽  
Christof Schütte

AbstractWe present a novel kernel-based machine learning algorithm for identifying the low-dimensional geometry of the effective dynamics of high-dimensional multiscale stochastic systems. Recently, the authors developed a mathematical framework for the computation of optimal reaction coordinates of such systems that is based on learning a parameterization of a low-dimensional transition manifold in a certain function space. In this article, we enhance this approach by embedding and learning this transition manifold in a reproducing kernel Hilbert space, exploiting the favorable properties of kernel embeddings. Under mild assumptions on the kernel, the manifold structure is shown to be preserved under the embedding, and distortion bounds can be derived. This leads to a more robust and more efficient algorithm compared to the previous parameterization approaches.


2013 ◽  
Vol 11 (05) ◽  
pp. 1350020 ◽  
Author(s):  
HONGWEI SUN ◽  
QIANG WU

We study the asymptotical properties of indefinite kernel network with coefficient regularization and dependent sampling. The framework under investigation is different from classical kernel learning. Positive definiteness is not required by the kernel function and the samples are allowed to be weakly dependent with the dependence measured by a strong mixing condition. By a new kernel decomposition technique introduced in [27], two reproducing kernel Hilbert spaces and their associated kernel integral operators are used to characterize the properties and learnability of the hypothesis function class. Capacity independent error bounds and learning rates are deduced.


2014 ◽  
Vol 9 (4) ◽  
pp. 827-931 ◽  
Author(s):  
Joseph A. Ball ◽  
Dmitry S. Kaliuzhnyi-Verbovetskyi ◽  
Cora Sadosky ◽  
Victor Vinnikov

2009 ◽  
Vol 80 (3) ◽  
pp. 430-453 ◽  
Author(s):  
JOSEF DICK

AbstractWe give upper bounds on the Walsh coefficients of functions for which the derivative of order at least one has bounded variation of fractional order. Further, we also consider the Walsh coefficients of functions in periodic and nonperiodic reproducing kernel Hilbert spaces. A lower bound which shows that our results are best possible is also shown.


Sign in / Sign up

Export Citation Format

Share Document