Active Learning for Effectively Fine-Tuning Transfer Learning to Downstream Task

Language model (LM) has become a common method of transfer learning in Natural Language Processing (NLP) tasks when working with small labeled datasets. An LM is pretrained using an easily available large unlabelled text corpus and is fine-tuned with the labelled data to apply to the target (i.e., downstream) task. As an LM is designed to capture the linguistic aspects of semantics, it can be biased to linguistic features. We argue that exposing an LM model during fine-tuning to instances that capture diverse semantic aspects (e.g., topical, linguistic, semantic relations) present in the dataset will improve its performance on the underlying task. We propose a Mixed Aspect Sampling (MAS) framework to sample instances that capture different semantic aspects of the dataset and use the ensemble classifier to improve the classification performance. Experimental results show that MAS performs better than random sampling as well as the state-of-the-art active learning models to abuse detection tasks where it is hard to collect the labelled data for building an accurate classifier.

Download Full-text

PatentNet: multi-label classification of patent documents using deep learning based language understanding

Scientometrics ◽

10.1007/s11192-021-04179-4 ◽

2021 ◽

Author(s):

Arousha Haghighian Roudsari ◽

Jafar Afshar ◽

Wookey Lee ◽

Suan Lee

Keyword(s):

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Classification Performance ◽

Fine Tuning ◽

Language Models ◽

Classification Task ◽

Domain Experts ◽

Patent Classification ◽

Patent Documents

AbstractPatent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

Download Full-text

Generative Pre-Training from Molecules

10.33774/chemrxiv-2021-5fwjd ◽

2021 ◽

Author(s):

Sanjar Adilov

Keyword(s):

Language Processing ◽

State Of The Art ◽

Language Model ◽

Molecular Data ◽

Fine Tuning ◽

Model Parameters ◽

Property Prediction ◽

Machine Learning Methods ◽

Recent Success ◽

Language Construct

SMILES is a line notation for entering and representing molecules. Being inherently a language construct, it allows estimating molecular data in a self-supervised fashion by employing machine learning methods for natural language processing (NLP). The recent success of attention-based neural networks in NLP has made large-corpora transformer pretraining a de facto standard for learning representations and transferring knowledge to downstream tasks. In this work, we attempt to adapt transformer capabilities to a large SMILES corpus by constructing a GPT-2-like language model. We experimentally show that a pretrained causal transformer captures general knowledge that can be successfully transferred to such downstream tasks as focused molecule generation and single-/multi-output molecular-property prediction. For each task, we freeze model parameters and attach trainable lightweight networks between attention blocks—adapters—as alternative to fine-tuning. With a relatively modest setup, our transformer outperforms the recently proposed ChemBERTa transformer and approaches state-of-the-art MoleculeNet and Chemprop results. Overall, transformers pretrained on SMILES corpora are promising alternatives that do not require handcrafted feature engineering, make few assumptions about structure of data, and scale well with the pretraining data size.

Download Full-text

Utilizing Indonesian Universal Language Model Fine-tuning for Text Classification

Journal of Information Technology and Computer Science ◽

10.25126/jitecs.202053215 ◽

2021 ◽

Vol 5 (3) ◽

pp. 325

Author(s):

Hendra Bunyamin

Keyword(s):

Computer Vision ◽

Natural Language Processing ◽

Transfer Learning ◽

Language Processing ◽

Text Classification ◽

Language Model ◽

Fine Tuning ◽

Classification Task ◽

Universal Language ◽

Learning Technique

Inductive transfer learning technique has made a huge impact on the computer vision field. Particularly, computer vision applications including object detection, classification, and segmentation, are rarely trained from scratch; instead, they are fine-tuned from pretrained models, which are products of learning from huge datasets. In contrast to computer vision, state-of-the-art natural language processing models are still generally trained from the ground up. Accordingly, this research attempts to investigate an adoption of the transfer learning technique for natural language processing. Specifically, we utilize a transfer learning technique called Universal Language Model Fine-tuning (ULMFiT) for doing an Indonesian news text classification task. The dataset for constructing the language model is collected from several news providers from January to December 2017 whereas the dataset employed for text classification task comes from news articles provided by the Agency for the Assessment and Application of Technology (BPPT). To examine the impact of ULMFiT, we provide a baseline that is a vanilla neural network with two hidden layers. Although the performance of ULMFiT on validation set is lower than the one of our baseline, we find that the benefits of ULMFiT for the classification task significantly reduce the overfitting, that is the difference between train and validation accuracies from 4% to nearly zero.

Download Full-text

Mapping Impervious Surfaces in Town–Rural Transition Belts Using China’s GF-2 Imagery and Object-Based Deep CNNs

Remote Sensing ◽

10.3390/rs11030280 ◽

2019 ◽

Vol 11 (3) ◽

pp. 280 ◽

Cited By ~ 7

Author(s):

Yongyong Fu ◽

Kunkun Liu ◽

Zhangquan Shen ◽

Jinsong Deng ◽

Muye Gan ◽

...

Keyword(s):

Transfer Learning ◽

Rural Areas ◽

State Of The Art ◽

Classification Performance ◽

Learning Technologies ◽

Fine Tuning ◽

Great Success ◽

Impervious Surfaces ◽

Convolutional Network ◽

Object Based

Impervious surfaces play an important role in urban planning and sustainable environmental management. High-spatial-resolution (HSR) images containing pure pixels have significant potential for the detailed delineation of land surfaces. However, due to high intraclass variability and low interclass distance, the mapping and monitoring of impervious surfaces in complex town–rural areas using HSR images remains a challenge. The fully convolutional network (FCN) model, a variant of convolution neural networks (CNNs), recently achieved state-of-the-art performance in HSR image classification applications. However, due to the inherent nature of FCN processing, it is challenging for an FCN to precisely capture the detailed information of classification targets. To solve this problem, we propose an object-based deep CNN framework that integrates object-based image analysis (OBIA) with deep CNNs to accurately extract and estimate impervious surfaces. Specifically, we also adopted two widely used transfer learning technologies to expedite the training of deep CNNs. Finally, we compare our approach with conventional OBIA classification and state-of-the-art FCN-based methods, such as FCN-8s and the U-Net methods. Both of these FCN-based methods are well designed for pixel-wise classification applications and have achieved great success. Our results show that the proposed approach effectively identified impervious surfaces, with 93.9% overall accuracy. Compared with the existing methods, i.e., OBIA, FCN-8s and U-Net methods, it shows that our method achieves obviously improvement in accuracy. Our findings also suggest that the classification performance of our proposed method is related to training strategy, indicating that significantly higher accuracy can be achieved through transfer learning by fine-tuning rather than feature extraction. Our approach for the automatic extraction and mapping of impervious surfaces also lays a solid foundation for intelligent monitoring and the management of land use and land cover.

Download Full-text

Tuned bidirectional encoder representations from transformers for fake news detection

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v22.i3.pp1667-1671 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1667

Author(s):

Amsal Pardamean ◽

Hilman F. Pardede

Keyword(s):

Deep Learning ◽

Language Processing ◽

Traditional Method ◽

Language Model ◽

Learning Technologies ◽

Fine Tuning ◽

Fake News ◽

Source Of Information ◽

Better Than ◽

Made In

Online medias are currently the dominant source of Information due to not being limited by time and place, fast and wide distributions. However, inaccurate news, or often referred as fake news is a major problem in news dissemination for online medias. Inaccurate news is information that is not true, that is engineered to cover the real information and has no factual basis. Usually, inaccurate news is made in the form of news that has mass appeal and is presented in the guise of genuine and legitimate news nuances to deceive or change the reader's mind or opinion. Identification of inaccurate news from real news can be done with natural language processing (NLP) technologies. In this paper, we proposed bidirectional encoder representations from transformers (BERT) for inaccurate news identification. BERT is a language model based on deep learning technologies and it has found effective for many NLP tasks. In this study, we use transfer learning and fine-tuning to adapt BERT for inaccurate news identification. The experiments show that our method could achieve accuracy of 99.23%, recall 99.46%, precision 98.86%, and F-Score of 99.15%. It is largely better than traditional method for the same tasks.

Download Full-text

Modeling aspects of the language of life through transfer-learning protein sequences

BMC Bioinformatics ◽

10.1186/s12859-019-3220-8 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 24

Author(s):

Michael Heinzinger ◽

Ahmed Elnaggar ◽

Yu Wang ◽

Christian Dallago ◽

Dmitrii Nechaev ◽

...

Keyword(s):

Big Data ◽

Transfer Learning ◽

Language Processing ◽

Protein Function ◽

Language Model ◽

Protein Sequences ◽

Water Soluble ◽

Evolutionary Information ◽

Sequence Databases ◽

Better Than

Abstract Background Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here. Results We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis. Conclusion Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.

Download Full-text

New polyp image classification technique using transfer learning of network-in-network structure in endoscopic images

Scientific Reports ◽

10.1038/s41598-021-83199-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Young Jae Kim ◽

Jang Pyo Bae ◽

Jun-Won Chung ◽

Dong Kyun Park ◽

Kwang Gi Kim ◽

...

Keyword(s):

Colorectal Cancer ◽

Transfer Learning ◽

Test Data ◽

State Of The Art ◽

Early Stage ◽

Statistical Significance ◽

Recall Rate ◽

Training Data ◽

Fine Tuning ◽

Accuracy Evaluation

AbstractWhile colorectal cancer is known to occur in the gastrointestinal tract. It is the third most common form of cancer of 27 major types of cancer in South Korea and worldwide. Colorectal polyps are known to increase the potential of developing colorectal cancer. Detected polyps need to be resected to reduce the risk of developing cancer. This research improved the performance of polyp classification through the fine-tuning of Network-in-Network (NIN) after applying a pre-trained model of the ImageNet database. Random shuffling is performed 20 times on 1000 colonoscopy images. Each set of data are divided into 800 images of training data and 200 images of test data. An accuracy evaluation is performed on 200 images of test data in 20 experiments. Three compared methods were constructed from AlexNet by transferring the weights trained by three different state-of-the-art databases. A normal AlexNet based method without transfer learning was also compared. The accuracy of the proposed method was higher in statistical significance than the accuracy of four other state-of-the-art methods, and showed an 18.9% improvement over the normal AlexNet based method. The area under the curve was approximately 0.930 ± 0.020, and the recall rate was 0.929 ± 0.029. An automatic algorithm can assist endoscopists in identifying polyps that are adenomatous by considering a high recall rate and accuracy. This system can enable the timely resection of polyps at an early stage.

Download Full-text

Tomato pest classification using deep convolutional neural network with transfer learning, fine tuning and scratch learning

Intelligent Decision Technologies ◽

10.3233/idt-200192 ◽

2021 ◽

pp. 1-10

Author(s):

Gayatri Pattnaik ◽

Vimal K. Shrivastava ◽

K. Parvathi

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Transfer Learning ◽

Data Augmentation ◽

State Of The Art ◽

Deep Convolutional Neural Network ◽

Fine Tuning ◽

Tomato Plants ◽

Random Weights

Pests are major threat to economic growth of a country. Application of pesticide is the easiest way to control the pest infection. However, excessive utilization of pesticide is hazardous to environment. The recent advances in deep learning have paved the way for early detection and improved classification of pest in tomato plants which will benefit the farmers. This paper presents a comprehensive analysis of 11 state-of-the-art deep convolutional neural network (CNN) models with three configurations: transfers learning, fine-tuning and scratch learning. The training in transfer learning and fine tuning initiates from pre-trained weights whereas random weights are used in case of scratch learning. In addition, the concept of data augmentation has been explored to improve the performance. Our dataset consists of 859 tomato pest images from 10 categories. The results demonstrate that the highest classification accuracy of 94.87% has been achieved in the transfer learning approach by DenseNet201 model with data augmentation.

Download Full-text

A Study of the Influence of Data Complexity and Similarity on Soft Biometrics Classification Performance in a Transfer Learning Scenario

Learning and Nonlinear Models ◽

10.21528/lnlm-vol18-no2-art5 ◽

2021 ◽

Vol 18 (2) ◽

pp. 56-65

Author(s):

Marcelo Romero ◽

◽

Matheus Gutoski ◽

Leandro Takeshi Hattori ◽

Manassés Ribeiro ◽

...

Keyword(s):

Transfer Learning ◽

Classification Performance ◽

Fine Tuning ◽

Similarity Metrics ◽

Soft Biometrics ◽

Final Layer ◽

The One ◽

Fully Connected ◽

The Relationship ◽

Fine Tune

Transfer learning is a paradigm that consists in training and testing classifiers with datasets drawn from distinct distributions. This technique allows to solve a particular problem using a model that was trained for another purpose. In the recent years, this practice has become very popular due to the increase of public available pre-trained models that can be fine-tuned to be applied in different scenarios. However, the relationship between the datasets used for training the model and the test data is usually not addressed, specially where the fine-tuning process is done only for the fully connected layers of a Convolutional Neural Network with pre-trained weights. This work presents a study regarding the relationship between the datasets used in a transfer learning process in terms of the performance achieved by models complexities and similarities. For this purpose, we fine-tune the final layer of Convolutional Neural Networks with pre-trained weights using diverse soft biometrics datasets. An evaluation of the performances of the models, when tested with datasets that are different from the one used for training the model, is presented. Complexity and similarity metrics are also used to perform the evaluation.

Download Full-text

Active Learning for Node Classification: An Evaluation

Entropy ◽

10.3390/e22101164 ◽

2020 ◽

Vol 22 (10) ◽

pp. 1164

Author(s):

Kaushalya Madhawa ◽

Tsuyoshi Murata

Keyword(s):

Neural Networks ◽

Active Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Classification Performance ◽

Data Types ◽

Neural Network Models ◽

Attributed Graph ◽

Attributed Graphs ◽

Graph Neural Networks

Current breakthroughs in the field of machine learning are fueled by the deployment of deep neural network models. Deep neural networks models are notorious for their dependence on large amounts of labeled data for training them. Active learning is being used as a solution to train classification models with less labeled instances by selecting only the most informative instances for labeling. This is especially important when the labeled data are scarce or the labeling process is expensive. In this paper, we study the application of active learning on attributed graphs. In this setting, the data instances are represented as nodes of an attributed graph. Graph neural networks achieve the current state-of-the-art classification performance on attributed graphs. The performance of graph neural networks relies on the careful tuning of their hyperparameters, usually performed using a validation set, an additional set of labeled instances. In label scarce problems, it is realistic to use all labeled instances for training the model. In this setting, we perform a fair comparison of the existing active learning algorithms proposed for graph neural networks as well as other data types such as images and text. With empirical results, we demonstrate that state-of-the-art active learning algorithms designed for other data types do not perform well on graph-structured data. We study the problem within the framework of the exploration-vs.-exploitation trade-off and propose a new count-based exploration term. With empirical evidence on multiple benchmark graphs, we highlight the importance of complementing uncertainty-based active learning models with an exploration term.

Download Full-text