Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification

Traditional supervised classification algorithms require a large number of labelled examples to perform accurately. Semi-supervised classification algorithms attempt to overcome this major limitation by also using unlabelled examples. Unlabelled examples have also been used to improve nearest neighbour text classification in a method called bridging. In this paper, we propose the use of bridging in a semi-supervised setting. We introduce a new bridging algorithm that can be used as a base classifier in most semi-supervised approaches. We empirically show that the classification performance of two semi-supervised algorithms, self-learning and co-training, improves with the use of our new bridging algorithm in comparison to using the standard classifier, JRipper. We propose a similarity metric for short texts and also study the performance of self-learning with a number of instance selection heuristics.

Download Full-text

Robust Neural Text Classification and Entailment via Mixup Regularized Adversarial Training

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3463122 ◽

2021 ◽

Author(s):

Jiahao Zhao ◽

Penghui Wei ◽

Wenji Mao

Keyword(s):

Text Classification ◽

Adversarial Training

Download Full-text

Text Classification by Contrastive Learning and Cross-lingual Data Augmentation for Alzheimer’s Disease Detection

10.18653/v1/2020.coling-main.542 ◽

2020 ◽

Author(s):

Zhiqiang Guo ◽

Zhaoci Liu ◽

Zhenhua Ling ◽

Shijin Wang ◽

Lingjing Jin ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Text Classification ◽

Data Augmentation ◽

Disease Detection ◽

Cross Lingual

Download Full-text

Data Augmentation with Adversarial Training for Cross-Lingual NLI

10.18653/v1/2021.acl-long.401 ◽

2021 ◽

Author(s):

Xin Dong ◽

Yaxin Zhu ◽

Zuohui Fu ◽

Dongkuan Xu ◽

Gerard de Melo

Keyword(s):

Data Augmentation ◽

Adversarial Training ◽

Cross Lingual

Download Full-text

Lightweight Random Indexing for Polylingual Text Classification

Journal of Artificial Intelligence Research ◽

10.1613/jair.5194 ◽

2016 ◽

Vol 57 ◽

pp. 151-185 ◽

Cited By ~ 1

Author(s):

Alejandro Moreo Fernández ◽

Andrea Esuli ◽

Fabrizio Sebastiani

Keyword(s):

Machine Translation ◽

Text Classification ◽

Classification Scheme ◽

Natural Languages ◽

Random Indexing ◽

Translation Tools ◽

Effectiveness And Efficiency ◽

Multilingual Text ◽

Cross Lingual ◽

One Machine

Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, which is the focus of this paper, we assume (differently from CLTC) that for each language in L there is a representative set of training documents; PLTC consists of improving the accuracy of each of the |L| monolingual classifiers by also leveraging the training documents written in the other (|L| − 1) languages. The obvious solution, consisting of generating a single polylingual classifier from the juxtaposed monolingual vector spaces, is usually infeasible, since the dimensionality of the resulting vector space is roughly |L| times that of a monolingual one, and is thus often unmanageable. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or are not always free to use. One machine-translation-free and dictionary-free method that, to the best of our knowledge, has never been applied to PLTC before, is Random Indexing (RI). We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing LRI). By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency) a number of previously proposed machine-translation-free and dictionary-free PLTC methods that we use as baselines.

Download Full-text