Do Not Have Enough Data? Deep Learning to the Rescue!

Ateret Anaby-Tavor; Boaz Carmeli; Esther Goldbraich; Amir Kantor; George Kour; Segev Shlomov; Naama Tepper; Naama Zwerdling

doi:10.1609/aaai.v34i05.6233

Do Not Have Enough Data? Deep Learning to the Rescue!

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6233 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7383-7390 ◽

Cited By ~ 4

Author(s):

Ateret Anaby-Tavor ◽

Boaz Carmeli ◽

Esther Goldbraich ◽

Amir Kantor ◽

George Kour ◽

...

Keyword(s):

Text Classification ◽

Data Augmentation ◽

State Of The Art ◽

Language Model ◽

Original Data ◽

Fine Tuning ◽

Initial Training ◽

Series Of Experiments ◽

Classification Tasks ◽

Trained Neural Network

Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.

Download Full-text

Tomato pest classification using deep convolutional neural network with transfer learning, fine tuning and scratch learning

Intelligent Decision Technologies ◽

10.3233/idt-200192 ◽

2021 ◽

pp. 1-10

Author(s):

Gayatri Pattnaik ◽

Vimal K. Shrivastava ◽

K. Parvathi

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Data Augmentation ◽

State Of The Art ◽

Deep Convolutional Neural Network ◽

Fine Tuning ◽

Tomato Plants ◽

Random Weights

Pests are major threat to economic growth of a country. Application of pesticide is the easiest way to control the pest infection. However, excessive utilization of pesticide is hazardous to environment. The recent advances in deep learning have paved the way for early detection and improved classification of pest in tomato plants which will benefit the farmers. This paper presents a comprehensive analysis of 11 state-of-the-art deep convolutional neural network (CNN) models with three configurations: transfers learning, fine-tuning and scratch learning. The training in transfer learning and fine tuning initiates from pre-trained weights whereas random weights are used in case of scratch learning. In addition, the concept of data augmentation has been explored to improve the performance. Our dataset consists of 859 tomato pest images from 10 categories. The results demonstrate that the highest classification accuracy of 94.87% has been achieved in the transfer learning approach by DenseNet201 model with data augmentation.

Download Full-text

Data Augmentation Based on Distributed Expressions in Text Classification Tasks

10.18653/v1/w19-8304 ◽

2019 ◽

Author(s):

Sugawara Yu

Keyword(s):

Text Classification ◽

Data Augmentation ◽

Classification Tasks

Download Full-text

Dual Adversarial Co-Learning for Multi-Domain Text Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6115 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6438-6445

Author(s):

Yuan Wu ◽

Yuhong Guo

Keyword(s):

Text Classification ◽

State Of The Art ◽

Digital Data ◽

Classification Model ◽

Classification Models ◽

Learning Framework ◽

Good Classification ◽

Classification Tasks ◽

Multiple Domains ◽

Learned Features

With the advent of deep learning, the performance of text classification models have been improved significantly. Nevertheless, the successful training of a good classification model requires a sufficient amount of labeled data, while it is always expensive and time consuming to annotate data. With the rapid growth of digital data, similar classification tasks can typically occur in multiple domains, while the availability of labeled data can largely vary across domains. Some domains may have abundant labeled data, while in some other domains there may only exist a limited amount (or none) of labeled data. Meanwhile text classification tasks are highly domain-dependent — a text classifier trained in one domain may not perform well in another domain. In order to address these issues, in this paper we propose a novel dual adversarial co-learning approach for multi-domain text classification (MDTC). The approach learns shared-private networks for feature extraction and deploys dual adversarial regularizations to align features across different domains and between labeled and unlabeled data simultaneously under a discrepancy based co-learning framework, aiming to improve the classifiers' generalization capacity with the learned features. We conduct experiments on multi-domain sentiment classification datasets. The results show the proposed approach achieves the state-of-the-art MDTC performance.

Download Full-text

Universal Language Model Fine-tuning for Text Classification

10.18653/v1/p18-1031 ◽

2018 ◽

Cited By ~ 289

Author(s):

Jeremy Howard ◽

Sebastian Ruder

Keyword(s):

Text Classification ◽

Language Model ◽

Fine Tuning ◽

Universal Language

Download Full-text

When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7158 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13773-13774

Author(s):

Shumin Deng ◽

Ningyu Zhang ◽

Zhanlin Sun ◽

Jiaoyan Chen ◽

Huajun Chen

Keyword(s):

Text Classification ◽

State Of The Art ◽

Language Model ◽

Language Models ◽

Generic Model ◽

Effective Strategy ◽

Linguistic Features ◽

Meta Learning ◽

Promising Solution ◽

Model Initialization

Text classification tends to be difficult when data are deficient or when it is required to adapt to unseen classes. In such challenging scenarios, recent studies have often used meta-learning to simulate the few-shot task, thus negating implicit common linguistic features across tasks. This paper addresses such problems using meta-learning and unsupervised language models. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. We show that our approach is not only simple but also produces a state-of-the-art performance on a well-studied sentiment classification dataset. It can thus be further suggested that pretraining could be a promising solution for few-shot learning of many other NLP tasks. The code and the dataset to replicate the experiments are made available at https://github.com/zxlzr/FewShotNLP.

Download Full-text

Low-Resource Text Classification via Cross-Lingual Language Model Fine-Tuning

Lecture Notes in Computer Science - Chinese Computational Linguistics ◽

10.1007/978-3-030-63031-7_17 ◽

2020 ◽

pp. 231-246

Author(s):

Xiuhong Li ◽

Zhe Li ◽

Jiabao Sheng ◽

Wushour Slamu

Keyword(s):

Text Classification ◽

Language Model ◽

Fine Tuning ◽

Low Resource ◽

Cross Lingual

Download Full-text

Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream Task

ACM Transactions on Intelligent Systems and Technology ◽

10.1145/3446343 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1-24

Author(s):

Md Abul Bashar ◽

Richi Nayak

Keyword(s):

Active Learning ◽

Transfer Learning ◽

Language Processing ◽

State Of The Art ◽

Language Model ◽

Ensemble Classifier ◽

Classification Performance ◽

Fine Tuning ◽

Linguistic Features ◽

Better Than

Language model (LM) has become a common method of transfer learning in Natural Language Processing (NLP) tasks when working with small labeled datasets. An LM is pretrained using an easily available large unlabelled text corpus and is fine-tuned with the labelled data to apply to the target (i.e., downstream) task. As an LM is designed to capture the linguistic aspects of semantics, it can be biased to linguistic features. We argue that exposing an LM model during fine-tuning to instances that capture diverse semantic aspects (e.g., topical, linguistic, semantic relations) present in the dataset will improve its performance on the underlying task. We propose a Mixed Aspect Sampling (MAS) framework to sample instances that capture different semantic aspects of the dataset and use the ensemble classifier to improve the classification performance. Experimental results show that MAS performs better than random sampling as well as the state-of-the-art active learning models to abuse detection tasks where it is hard to collect the labelled data for building an accurate classifier.

Download Full-text

TransBERT

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3427669 ◽

2021 ◽

Vol 20 (1) ◽

pp. 1-20

Author(s):

Zhongyang Li ◽

Xiao Ding ◽

Ting Liu

Keyword(s):

Natural Language ◽

Large Scale ◽

State Of The Art ◽

Language Model ◽

Fine Tuning ◽

Action Prediction ◽

Target Task ◽

Language Knowledge ◽

Previous State ◽

Transfer Tasks

Recent advances, such as GPT, BERT, and RoBERTa, have shown success in incorporating a pre-trained transformer language model and fine-tuning operations to improve downstream NLP systems. However, this framework still has some fundamental problems in effectively incorporating supervised knowledge from other related tasks. In this study, we investigate a transferable BERT (TransBERT) training framework, which can transfer not only general language knowledge from large-scale unlabeled data but also specific kinds of knowledge from various semantically related supervised tasks, for a target task. Particularly, we propose utilizing three kinds of transfer tasks, including natural language inference, sentiment classification, and next action prediction, to further train BERT based on a pre-trained model. This enables the model to get a better initialization for the target task. We take story-ending prediction as the target task to conduct experiments. The final results of 96.0% and 95.0% accuracy on two versions of Story Cloze Test datasets dramatically outperform previous state-of-the-art baseline methods. Several comparative experiments give some helpful suggestions on how to select transfer tasks to improve BERT. Furthermore, experiments on six English and three Chinese datasets show that TransBERT generalizes well to other tasks, languages, and pre-trained models.

Download Full-text

Generative Pre-Training from Molecules

10.33774/chemrxiv-2021-5fwjd ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Language Processing ◽

State Of The Art ◽

Language Model ◽

Molecular Data ◽

Fine Tuning ◽

Model Parameters ◽

Property Prediction ◽

Machine Learning Methods ◽

Recent Success ◽

Language Construct

SMILES is a line notation for entering and representing molecules. Being inherently a language construct, it allows estimating molecular data in a self-supervised fashion by employing machine learning methods for natural language processing (NLP). The recent success of attention-based neural networks in NLP has made large-corpora transformer pretraining a de facto standard for learning representations and transferring knowledge to downstream tasks. In this work, we attempt to adapt transformer capabilities to a large SMILES corpus by constructing a GPT-2-like language model. We experimentally show that a pretrained causal transformer captures general knowledge that can be successfully transferred to such downstream tasks as focused molecule generation and single-/multi-output molecular-property prediction. For each task, we freeze model parameters and attach trainable lightweight networks between attention blocks—adapters—as alternative to fine-tuning. With a relatively modest setup, our transformer outperforms the recently proposed ChemBERTa transformer and approaches state-of-the-art MoleculeNet and Chemprop results. Overall, transformers pretrained on SMILES corpora are promising alternatives that do not require handcrafted feature engineering, make few assumptions about structure of data, and scale well with the pretraining data size.

Download Full-text

Utilizing Indonesian Universal Language Model Fine-tuning for Text Classification

Journal of Information Technology and Computer Science ◽

10.25126/jitecs.202053215 ◽

2021 ◽

Vol 5 (3) ◽

pp. 325

Author(s):

Hendra Bunyamin

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Transfer Learning ◽

Language Processing ◽

Text Classification ◽

Language Model ◽

Fine Tuning ◽

Classification Task ◽

Universal Language ◽

Learning Technique

Inductive transfer learning technique has made a huge impact on the computer vision field. Particularly, computer vision applications including object detection, classification, and segmentation, are rarely trained from scratch; instead, they are fine-tuned from pretrained models, which are products of learning from huge datasets. In contrast to computer vision, state-of-the-art natural language processing models are still generally trained from the ground up. Accordingly, this research attempts to investigate an adoption of the transfer learning technique for natural language processing. Specifically, we utilize a transfer learning technique called Universal Language Model Fine-tuning (ULMFiT) for doing an Indonesian news text classification task. The dataset for constructing the language model is collected from several news providers from January to December 2017 whereas the dataset employed for text classification task comes from news articles provided by the Agency for the Assessment and Application of Technology (BPPT). To examine the impact of ULMFiT, we provide a baseline that is a vanilla neural network with two hidden layers. Although the performance of ULMFiT on validation set is lower than the one of our baseline, we find that the benefits of ULMFiT for the classification task significantly reduce the overfitting, that is the difference between train and validation accuracies from 4% to nearly zero.

Download Full-text