Adapting Pre-trained Language Models to Rumor Detection on Twitter

Hamda Slimi; Ibrahim Bounhas; Yahya Slimani

doi:10.3897/jucs.65918

Adapting Pre-trained Language Models to Rumor Detection on Twitter

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.65918 ◽

2021 ◽

Vol 27 (10) ◽

pp. 1128-1148

Author(s):

Hamda Slimi ◽

Ibrahim Bounhas ◽

Yahya Slimani

Keyword(s):

State Of The Art ◽

Language Model ◽

Fine Tuning ◽

Language Models ◽

Fake News ◽

Timely Manner ◽

Fast Pace ◽

Social Media Platforms ◽

Rumor Detection ◽

Malicious Intent

Fake news has invaded social media platforms where false information is being propagated with malicious intent at a fast pace. These circumstances required the development of solutions to monitor and detect rumor in a timely manner. In this paper, we propose an approach that seeks to detect emerging and unseen rumors on Twitter by adapting a pre-trained language model to the task of rumor detection, namely RoBERTa. A comparison against content-based characteristics has shown the capability of the model to surpass handcrafted features. Experimental results show that our approach outperforms state of the art ones in all metrics and that the fine tuning of RoBERTa led to richer word embeddings that consistently and significantly enhance the precision of rumor recognition.

Download Full-text

MSA Transformer

10.1101/2021.02.12.430858 ◽

2021 ◽

Author(s):

Roshan Rao ◽

Jason Liu ◽

Robert Verkuil ◽

Joshua Meier ◽

John F. Canny ◽

...

Keyword(s):

Structure Learning ◽

State Of The Art ◽

Language Model ◽

Language Modeling ◽

Language Models ◽

Multiple Sequence ◽

Wide Margin ◽

Current State ◽

Individual Sequences ◽

And Function

AbstractUnsupervised protein language models trained across millions of diverse sequences learn structure and function of proteins. Protein language models studied to date have been trained to perform inference from individual sequences. The longstanding approach in computational biology has been to make inferences from a family of evolutionarily related sequences by fitting a model to each family independently. In this work we combine the two paradigms. We introduce a protein language model which takes as input a set of sequences in the form of a multiple sequence alignment. The model interleaves row and column attention across the input sequences and is trained with a variant of the masked language modeling objective across many protein families. The performance of the model surpasses current state-of-the-art unsupervised structure learning methods by a wide margin, with far greater parameter efficiency than prior state-of-the-art protein language models.

Download Full-text

Fast, Small and Exact: Infinite-order Language Modelling with Compressed Suffix Trees

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00112 ◽

2016 ◽

Vol 4 ◽

pp. 477-490 ◽

Cited By ~ 4

Author(s):

Ehsan Shareghi ◽

Matthias Petri ◽

Gholamreza Haffari ◽

Trevor Cohn

Keyword(s):

Infinite Order ◽

State Of The Art ◽

Language Model ◽

Language Models ◽

Memory Usage ◽

Suffix Trees ◽

Construction Time ◽

Modest Increase ◽

Order Language ◽

Language Modelling

Efficient methods for storing and querying are critical for scaling high-order m-gram language models to large corpora. We propose a language model based on compressed suffix trees, a representation that is highly compact and can be easily held in memory, while supporting queries needed in computing language model probabilities on-the-fly. We present several optimisations which improve query runtimes up to 2500×, despite only incurring a modest increase in construction time and memory usage. For large corpora and high Markov orders, our method is highly competitive with the state-of-the-art KenLM package. It imposes much lower memory requirements, often by orders of magnitude, and has runtimes that are either similar (for training) or comparable (for querying).

Download Full-text

Rumor Detection Based on SAGNN: Simplified Aggregation Graph Neural Networks

10.20944/preprints202011.0511.v1 ◽

2020 ◽

Author(s):

Liang Zhang ◽

Jingqun Li ◽

Bin Zhou ◽

Yan Jia

Keyword(s):

Neural Networks ◽

Numerical Experiments ◽

State Of The Art ◽

Fake News ◽

Wide Spread ◽

Convolutional Networks ◽

The Media ◽

Graph Neural Networks ◽

Rumor Detection ◽

Simple Network

Identifying fake news on the media has been an important issue. This is especially true considering the wide spread of rumors on the popular social networks such as Twitter. Various kinds of techniques have been proposed to detect rumors. In this work, we study the application of graph neural networks for the task of rumor detection, and present a simplified new architecture to classify rumors. Numerical experiments show that the proposed simple network has comparable to or even better performance than state-of-the art graph convolutional networks, while having significantly reduced the computational complexity.

Download Full-text

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00405 ◽

2021 ◽

pp. 1-55

Author(s):

Daniel Loureiro ◽

Kiamehr Rezaee ◽

Mohammad Taher Pilehvar ◽

Jose Camacho-Collados

Keyword(s):

Feature Extraction ◽

Word Sense Disambiguation ◽

Language Model ◽

Training Data ◽

Fine Tuning ◽

Language Models ◽

Coarse Grained ◽

Word Sense ◽

Sense Disambiguation ◽

High Level

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

Download Full-text

Detection of Manipulated Face Videos over Social Networks: A Large-Scale Study

Journal of Imaging ◽

10.3390/jimaging7100193 ◽

2021 ◽

Vol 7 (10) ◽

pp. 193

Author(s):

Federico Marcon ◽

Cecilia Pasquini ◽

Giulia Boato

Keyword(s):

Large Scale ◽

State Of The Art ◽

Forensic Analysis ◽

General Purpose ◽

Fine Tuning ◽

Specific Technique ◽

Multimedia Forensics ◽

Shared Data ◽

Social Media Platforms ◽

The Web

The detection of manipulated videos represents a highly relevant problem in multimedia forensics, which has been widely investigated in the last years. However, a common trait of published studies is the fact that the forensic analysis is typically applied on data prior to their potential dissemination over the web. This work addresses the challenging scenario where manipulated videos are first shared through social media platforms and then are subject to the forensic analysis. In this context, a large scale performance evaluation has been carried out involving general purpose deep networks and state-of-the-art manipulated data, and studying different effects. Results confirm that a performance drop is observed in every case when unseen shared data are tested by networks trained on non-shared data; however, fine-tuning operations can mitigate this problem. Also, we show that the output of differently trained networks can carry useful forensic information for the identification of the specific technique used for visual manipulation, both for shared and non-shared data.

Download Full-text

Towards Combating Pandemic-Related Misinformation in Social Media

Advances in Data Mining and Database Management - Data Science Advancements in Pandemic and Outbreak Management ◽

10.4018/978-1-7998-6736-4.ch008 ◽

2021 ◽

pp. 140-158

Author(s):

Isa Inuwa-Dutse

Keyword(s):

Social Media ◽

Public Discourse ◽

Preventive Measures ◽

State Of The Art ◽

Fake News ◽

Social Distancing ◽

The Public ◽

Toxic Impact ◽

Online Social Media ◽

Social Media Platforms

Conventional preventive measures during pandemics include social distancing and lockdown. Such measures in the time of social media brought about a new set of challenges – vulnerability to the toxic impact of online misinformation is high. A case in point is COVID-19. As the virus propagates, so does the associated misinformation and fake news about it leading to an infodemic. Since the outbreak, there has been a surge of studies investigating various aspects of the pandemic. Of interest to this chapter are studies centering on datasets from online social media platforms where the bulk of the public discourse happens. The main goal is to support the fight against negative infodemic by (1) contributing a diverse set of curated relevant datasets; (2) offering relevant areas to study using the datasets; and (3) demonstrating how relevant datasets, strategies, and state-of-the-art IT tools can be leveraged in managing the pandemic.

Download Full-text

Assessment of Word-Level Neural Language Models for Sentence Completion

Applied Sciences ◽

10.3390/app10041340 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1340

Author(s):

Heewoong Park ◽

Jonghun Park

Keyword(s):

Language Model ◽

Fine Tuning ◽

Language Models ◽

Sentence Completion ◽

Korean Language ◽

Learning Framework ◽

Scholastic Aptitude ◽

Word Level ◽

Network Language ◽

Comprehensive Study

The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for the sentence completion based on neural language models, which have been advanced in recent years. First, we revisited the recurrent neural network language model (RNN LM), achieving highly competitive results with an appropriate network structure and hyper-parameters. This paper presents a bidirectional version of RNN LM, which surpassed the previous best results on Microsoft Research (MSR) Sentence Completion Challenge and the Scholastic Aptitude Test (SAT) sentence completion questions. In parallel with directly applying RNN LM to sentence completion, we also employed a supervised learning framework that fine-tunes a large pre-trained transformer-based LM with a few sentence-completion examples. By fine-tuning a pre-trained BERT model, this work established state-of-the-art results on the MSR and SAT sets. Furthermore, we performed similar experimentation on newly collected cloze-style questions in the Korean language. The experimental results reveal that simply applying the multilingual BERT models for the Korean dataset was not satisfactory, which leaves room for further research.

Download Full-text

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain

Applied Sciences ◽

10.3390/app11136007 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6007

Author(s):

Muzamil Hussain Syed ◽

Sun-Tae Chung

Keyword(s):

Domain Adaptation ◽

Language Model ◽

Named Entity Recognition ◽

Word Embedding ◽

Fine Tuning ◽

Entity Recognition ◽

Language Models ◽

Feature Vectors ◽

Named Entity ◽

Domain Specific

Entity-based information extraction is one of the main applications of Natural Language Processing (NLP). Recently, deep transfer-learning utilizing contextualized word embedding from pre-trained language models has shown remarkable results for many NLP tasks, including Named-entity recognition (NER). BERT (Bidirectional Encoder Representations from Transformers) is gaining prominent attention among various contextualized word embedding models as a state-of-the-art pre-trained language model. It is quite expensive to train a BERT model from scratch for a new application domain since it needs a huge dataset and enormous computing time. In this paper, we focus on menu entity extraction from online user reviews for the restaurant and propose a simple but effective approach for NER task on a new domain where a large dataset is rarely available or difficult to prepare, such as food menu domain, based on domain adaptation technique for word embedding and fine-tuning the popular NER task network model ‘Bi-LSTM+CRF’ with extended feature vectors. The proposed NER approach (named as ‘MenuNER’) consists of two step-processes: (1) Domain adaptation for target domain; further pre-training of the off-the-shelf BERT language model (BERT-base) in semi-supervised fashion on a domain-specific dataset, and (2) Supervised fine-tuning the popular Bi-LSTM+CRF network for downstream task with extended feature vectors obtained by concatenating word embedding from the domain-adapted pre-trained BERT model from the first step, character embedding and POS tag feature information. Experimental results on handcrafted food menu corpus from customers’ review dataset show that our proposed approach for domain-specific NER task, that is: food menu named-entity recognition, performs significantly better than the one based on the baseline off-the-shelf BERT-base model. The proposed approach achieves 92.5% F1 score on the YELP dataset for the MenuNER task.

Download Full-text

Transformer-Based Language Model Fine-Tuning Methods for COVID-19 Fake News Detection

Combating Online Hostile Posts in Regional Languages during Emergency Situation - Communications in Computer and Information Science ◽

10.1007/978-3-030-73696-5_9 ◽

2021 ◽

pp. 83-92

Author(s):

Ben Chen ◽

Bin Chen ◽

Dehong Gao ◽

Qijin Chen ◽

Chengfu Huo ◽

...

Keyword(s):

Language Model ◽

Fine Tuning ◽

Fake News ◽

Tuning Methods

Download Full-text

When Low Resource NLP Meets Unsupervised Language Model: Meta-Pretraining then Meta-Learning for Few-Shot Text Classification (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7158 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13773-13774

Author(s):

Shumin Deng ◽

Ningyu Zhang ◽

Zhanlin Sun ◽

Jiaoyan Chen ◽

Huajun Chen

Keyword(s):

Text Classification ◽

State Of The Art ◽

Language Model ◽

Language Models ◽

Generic Model ◽

Effective Strategy ◽

Linguistic Features ◽

Meta Learning ◽

Promising Solution ◽

Model Initialization

Text classification tends to be difficult when data are deficient or when it is required to adapt to unseen classes. In such challenging scenarios, recent studies have often used meta-learning to simulate the few-shot task, thus negating implicit common linguistic features across tasks. This paper addresses such problems using meta-learning and unsupervised language models. Our approach is based on the insight that having a good generalization from a few examples relies on both a generic model initialization and an effective strategy for adapting this model to newly arising tasks. We show that our approach is not only simple but also produces a state-of-the-art performance on a well-studied sentiment classification dataset. It can thus be further suggested that pretraining could be a promising solution for few-shot learning of many other NLP tasks. The code and the dataset to replicate the experiments are made available at https://github.com/zxlzr/FewShotNLP.

Download Full-text