Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

BACKGROUND Mental illness affects a significant portion of the worldwide population. Online mental health forums can provide a supportive environment for those afflicted and also generate a large amount of data that can be mined to predict mental health states using machine learning methods. OBJECTIVE This study aimed to benchmark multiple methods of text feature representation for social media posts and compare their downstream use with automated machine learning (AutoML) tools. We tested on datasets that contain posts labeled for perceived suicide risk or moderator attention in the context of self-harm. Specifically, we assessed the ability of the methods to prioritize posts that a moderator would identify for immediate response. METHODS We used 1588 labeled posts from the Computational Linguistics and Clinical Psychology (CLPsych) 2017 shared task collected from the Reachout.com forum. Posts were represented using lexicon-based tools, including Valence Aware Dictionary and sEntiment Reasoner, Empath, and Linguistic Inquiry and Word Count, and also using pretrained artificial neural network models, including DeepMoji, Universal Sentence Encoder, and Generative Pretrained Transformer-1 (GPT-1). We used Tree-based Optimization Tool and Auto-Sklearn as AutoML tools to generate classifiers to triage the posts. RESULTS The top-performing system used features derived from the GPT-1 model, which was fine-tuned on over 150,000 unlabeled posts from Reachout.com. Our top system had a macroaveraged F1 score of 0.572, providing a new state-of-the-art result on the CLPsych 2017 task. This was achieved without additional information from metadata or preceding posts. Error analyses revealed that this top system often misses expressions of hopelessness. In addition, we have presented visualizations that aid in the understanding of the learned classifiers. CONCLUSIONS In this study, we found that transfer learning is an effective strategy for predicting risk with relatively little labeled data and noted that fine-tuning of pretrained language models provides further gains when large amounts of unlabeled text are available.

Download Full-text

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

10.18653/v1/2020.coling-main.472 ◽

2020 ◽

Author(s):

Seoyeon Park ◽

Cornelia Caragea

Keyword(s):

Transfer Learning ◽

Language Models ◽

Task Transfer

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text

Automated Paraphrase Quality Assessment Using Language Models and Transfer Learning

Computers ◽

10.3390/computers10120166 ◽

2021 ◽

Vol 10 (12) ◽

pp. 166

Author(s):

Bogdan Nicula ◽

Mihai Dascalu ◽

Natalie N. Newton ◽

Ellen Orcutt ◽

Danielle S. McNamara

Keyword(s):

Quality Assessment ◽

Transfer Learning ◽

Network Models ◽

Learning Task ◽

Fine Tuning ◽

Language Models ◽

Neural Network Models ◽

Writing Ability ◽

Four Dimensions ◽

Timely Feedback

Learning to paraphrase supports both writing ability and reading comprehension, particularly for less skilled learners. As such, educational tools that integrate automated evaluations of paraphrases can be used to provide timely feedback to enhance learner paraphrasing skills more efficiently and effectively. Paraphrase identification is a popular NLP classification task that involves establishing whether two sentences share a similar meaning. Paraphrase quality assessment is a slightly more complex task, in which pairs of sentences are evaluated in-depth across multiple dimensions. In this study, we focus on four dimensions: lexical, syntactical, semantic, and overall quality. Our study introduces and evaluates various machine learning models using handcrafted features combined with Extra Trees, Siamese neural networks using BiLSTM RNNs, and pretrained BERT-based models, together with transfer learning from a larger general paraphrase corpus, to estimate the quality of paraphrases across the four dimensions. Two datasets are considered for the tasks involving paraphrase quality: ULPC (User Language Paraphrase Corpus) containing 1998 paraphrases and a smaller dataset with 115 paraphrases based on children’s inputs. The paraphrase identification dataset used for the transfer learning task is the MSRP dataset (Microsoft Research Paraphrase Corpus) containing 5801 paraphrases. On the ULPC dataset, our BERT model improves upon the previous baseline by at least 0.1 in F1-score across the four dimensions. When using fine-tuning from ULPC for the children dataset, both the BERT and Siamese neural network models improve upon their original scores by at least 0.11 F1-score. The results of these experiments suggest that transfer learning using generic paraphrase identification datasets can be successful, while at the same time obtaining comparable results in fewer epochs.

Download Full-text

Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings

Database ◽

10.1093/database/bay066 ◽

2018 ◽

Vol 2018 ◽

Cited By ~ 15

Author(s):

P Corbett ◽

J Boyle

Keyword(s):

Transfer Learning ◽

Protein Interactions ◽

Word Embeddings

Download Full-text

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

10.18653/v1/2020.acl-main.467 ◽

2020 ◽

Author(s):

Yada Pruksachatkun ◽

Jason Phang ◽

Haokun Liu ◽

Phu Mon Htut ◽

Xiaoyi Zhang ◽

...

Keyword(s):

Transfer Learning ◽

Language Models ◽

Task Transfer

Download Full-text

Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?

Depression and Anxiety Prediction Using Deep Language Models and Transfer Learning

Efficient Transfer Learning for Neural Network Language Models

Understanding Word Embeddings and Language Models

Amobee at IEST 2018: Transfer Learning from Language Models

Transfer Learning for Risk Classification of Social Media Posts: Model Evaluation Study (Preprint)

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Automated Paraphrase Quality Assessment Using Language Models and Transfer Learning

Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Export Citation Format