A Fine-Tuned BERT-Based Transfer Learning Approach for Text Classification

Journal of Healthcare Engineering ◽

10.1155/2022/3498123 ◽

2022 ◽

Vol 2022 ◽

pp. 1-17

Author(s):

Rukhma Qasim ◽

Waqas Haider Bangyal ◽

Mohammed A. Alqarni ◽

Abdulwahab Ali Almazroi

Keyword(s):

Data Mining ◽

Social Media ◽

Transfer Learning ◽

Language Processing ◽

Text Classification ◽

Hate Speech ◽

Classification Problem ◽

Learning Approaches ◽

Fake News ◽

Targeted Marketing

Text Classification problem has been thoroughly studied in information retrieval problems and data mining tasks. It is beneficial in multiple tasks including medical diagnose health and care department, targeted marketing, entertainment industry, and group filtering processes. A recent innovation in both data mining and natural language processing gained the attention of researchers from all over the world to develop automated systems for text classification. NLP allows categorizing documents containing different texts. A huge amount of data is generated on social media sites through social media users. Three datasets have been used for experimental purposes including the COVID-19 fake news dataset, COVID-19 English tweet dataset, and extremist-non-extremist dataset which contain news blogs, posts, and tweets related to coronavirus and hate speech. Transfer learning approaches do not experiment on COVID-19 fake news and extremist-non-extremist datasets. Therefore, the proposed work applied transfer learning classification models on both these datasets to check the performance of transfer learning models. Models are trained and evaluated on the accuracy, precision, recall, and F1-score. Heat maps are also generated for every model. In the end, future directions are proposed.

Download Full-text

Intelligent Detection of False Information in Arabic Tweets Utilizing Hybrid Harris Hawks Based Feature Selection and Machine Learning Models

Symmetry ◽

10.3390/sym13040556 ◽

2021 ◽

Vol 13 (4) ◽

pp. 556

Author(s):

Thaer Thaher ◽

Mahmoud Saheb ◽

Hamza Turabieh ◽

Hamouda Chantar

Keyword(s):

Machine Learning ◽

Social Media ◽

Feature Selection ◽

Language Processing ◽

User Profile ◽

Vital Role ◽

Classification Model ◽

Fake News ◽

False Information ◽

Social Media Platforms

Fake or false information on social media platforms is a significant challenge that leads to deliberately misleading users due to the inclusion of rumors, propaganda, or deceptive information about a person, organization, or service. Twitter is one of the most widely used social media platforms, especially in the Arab region, where the number of users is steadily increasing, accompanied by an increase in the rate of fake news. This drew the attention of researchers to provide a safe online environment free of misleading information. This paper aims to propose a smart classification model for the early detection of fake news in Arabic tweets utilizing Natural Language Processing (NLP) techniques, Machine Learning (ML) models, and Harris Hawks Optimizer (HHO) as a wrapper-based feature selection approach. Arabic Twitter corpus composed of 1862 previously annotated tweets was utilized by this research to assess the efficiency of the proposed model. The Bag of Words (BoW) model is utilized using different term-weighting schemes for feature extraction. Eight well-known learning algorithms are investigated with varying combinations of features, including user-profile, content-based, and words-features. Reported results showed that the Logistic Regression (LR) with Term Frequency-Inverse Document Frequency (TF-IDF) model scores the best rank. Moreover, feature selection based on the binary HHO algorithm plays a vital role in reducing dimensionality, thereby enhancing the learning model’s performance for fake news detection. Interestingly, the proposed BHHO-LR model can yield a better enhancement of 5% compared with previous works on the same dataset.

Download Full-text

Análise de discursos em notícias sobre homofobia, racismo e sexismo em comentários de portais brasileiros de notícias

10.14210/cotb.v12.p467-474 ◽

2021 ◽

Author(s):

Lucas Rodrigues ◽

Antonio Jacob Junior ◽

Fábio Lobato

Keyword(s):

Social Media ◽

Natural Language Processing ◽

Sentiment Analysis ◽

Data Visualization ◽

Language Processing ◽

Topic Modeling ◽

Hate Speech ◽

Psychological Impact ◽

Internet Service ◽

General Law

Posts with defamatory content or hate speech are constantly foundon social media. The results for readers are numerous, not restrictedonly to the psychological impact, but also to the growth of thissocial phenomenon. With the General Law on the Protection ofPersonal Data and the Marco Civil da Internet, service providersbecame responsible for the content in their platforms. Consideringthe importance of this issue, this paper aims to analyze the contentpublished (news and comments) on the G1 News Portal with techniquesbased on data visualization and Natural Language Processing,such as sentiment analysis and topic modeling. The results showthat even with most of the comments being neutral or negative andclassified or not as hate speech, the majority of them were acceptedby the users.

Download Full-text

Detection of FAKE NEWS on SOCIAL MEDIA using CLASSIFICATION Data Mining Techniques

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1637.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3132-3138

Keyword(s):

Machine Learning ◽

Data Mining ◽

Social Media ◽

Information Exchange ◽

Learning Algorithm ◽

Daily Life ◽

Support Vector ◽

Machine Learning Algorithm ◽

Fake News ◽

Other Information

In today’s world social media is one of the most important tool for communication that helps people to interact with each other and share their thoughts, knowledge or any other information. Some of the most popular social media websites are Facebook, Twitter, Whatsapp and Wechat etc. Since, it has a large impact on people’s daily life it can be used a source for any fake or misinformation. So it is important that any information presented on social media should be evaluated for its genuineness and originality in terms of the probability of correctness and reliability to trust the information exchange. In this work we have identified the features that can be helpful in predicting whether a given Tweet is Rumor or Information. Two machine learning algorithm are executed using WEKA tool for the classification that is Decision Tree and Support Vector Machine.

Download Full-text

Detection of Economy-Related Turkish Tweets Based on Machine Learning Approaches

10.4018/978-1-7998-8413-2.ch008 ◽

2022 ◽

pp. 171-195

Author(s):

Jale Bektaş

Keyword(s):

Machine Learning ◽

Text Mining ◽

Text Classification ◽

Integration Method ◽

Classification Problem ◽

Feature Representation ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linguistic Approach ◽

Turkish Language

Conducting NLP for Turkish is a lot harder than other Latin-based languages such as English. In this study, by using text mining techniques, a pre-processing frame is conducted in which TF-IDF values are calculated in accordance with a linguistic approach on 7,731 tweets shared by 13 famous economists in Turkey, retrieved from Twitter. Then, the classification results are compared with four common machine learning methods (SVM, Naive Bayes, LR, and integration LR with SVM). The features represented by the TF-IDF are experimented in different N-grams. The findings show the success of a text classification problem is relative with the feature representation methods, and the performance superiority of SVM is better compared to other ML methods with unigram feature representation. The best results are obtained via the integration method of SVM with LR with the Acc of 82.9%. These results show that these methodologies are satisfying for the Turkish language.

Download Full-text

New explainability method for BERT-based model in fake news detection

Scientific Reports ◽

10.1038/s41598-021-03100-6 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mateusz Szczepański ◽

Marek Pawlicki ◽

Rafał Kozik ◽

Michał Choraś

Keyword(s):

Artificial Intelligence ◽

Social Media ◽

Language Processing ◽

High Performance ◽

Real Life ◽

Contemporary Society ◽

Fake News ◽

Interpretable Model ◽

Deep Integration ◽

The Impact

AbstractThe ubiquity of social media and their deep integration in the contemporary society has granted new ways to interact, exchange information, form groups, or earn money—all on a scale never seen before. Those possibilities paired with the widespread popularity contribute to the level of impact that social media display. Unfortunately, the benefits brought by them come at a cost. Social Media can be employed by various entities to spread disinformation—so called ‘Fake News’, either to make a profit or influence the behaviour of the society. To reduce the impact and spread of Fake News, a diverse array of countermeasures were devised. These include linguistic-based approaches, which often utilise Natural Language Processing (NLP) and Deep Learning (DL). However, as the latest advancements in the Artificial Intelligence (AI) domain show, the model’s high performance is no longer enough. The explainability of the system’s decision is equally crucial in real-life scenarios. Therefore, the objective of this paper is to present a novel explainability approach in BERT-based fake news detectors. This approach does not require extensive changes to the system and can be attached as an extension for operating detectors. For this purposes, two Explainable Artificial Intelligence (xAI) techniques, Local Interpretable Model-Agnostic Explanations (LIME) and Anchors, will be used and evaluated on fake news data, i.e., short pieces of text forming tweets or headlines. This focus of this paper is on the explainability approach for fake news detectors, as the detectors themselves were part of previous works of the authors.

Download Full-text

Implementation Analysis of Data Classification Approach for Sentiment Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36613 ◽

2021 ◽

Vol 9 (VII) ◽

pp. 1509-1512

Author(s):

Bhushan R. Chincholkar

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Language Processing ◽

Text Analysis ◽

Hate Speech ◽

Modern Technology ◽

Quality Information ◽

Classification Approach ◽

Potential Benefits ◽

Better Than

Sentiment analysis is one of the fastest growing fields with its demand and potential benefits that are increasing every day. Sentiment analysis aims to classify the polarity of a document through natural language processing, text analysis. With the help of internet and modern technology, there has bee n a tremendous growth in the amount of data. Each individual is in position to precise his/her own ideas freely on social media. All of this data can be analyzed and used in order to draw benefits and quality information. In this paper, the focus is on cyber-hate classification based on for public opinion or views, since the spread of hate speech using social media can have disruptive impacts on social sentiment analysis. In particular, here proposing a modified approach with two stage training for dealing with text ambiguity and classifying three type approach positive, negative and neutral sentiment, and compare its performance with those popular methods also as well as some existing fuzzy approaches. Afterword comparing the performance of proposed approach with commonly used sentiment classifiers which are known to perform well in this task. The experimental results indicate that our modified approach performs marginally better than the other algorithms.

Download Full-text

Comparison of pretraining models and strategies for health-related social media text classification

10.1101/2021.09.28.21264253 ◽

2021 ◽

Author(s):

Yuting Guo ◽

Yao Ge ◽

Yuan-Chi Yang ◽

Mohammed Ali Al-Garadi ◽

Abeed Sarker

Keyword(s):

Social Media ◽

Language Processing ◽

Text Classification ◽

High Performance ◽

Language Models ◽

Learning Performance ◽

Health Related ◽

Social Media Text ◽

Performance Results ◽

Better Than

Motivation Pretrained contextual language models proposed in the recent past have been reported to achieve state-of-the-art performances in many natural language processing (NLP) tasks. There is a need to benchmark such models for targeted NLP tasks, and to explore effective pretraining strategies to improve machine learning performance. Results In this work, we addressed the task of health-related social media text classification. We benchmarked five models-RoBERTa, BERTweet, TwitterBERT, BioClinical_BERT, and BioBERT on 22 tasks. We attempted to boost performance for the best models by comparing distinct pretraining strategies-domain-adaptive pretraining (DAPT), source-adaptive pretraining (SAPT), and topic-specific pretraining (TSPT). RoBERTa and BERTweet performed comparably in most tasks, and better than others. For pretraining strategies, SAPT performed better or comparable to the off-the-shelf models, and significantly outperformed DAPT. SAPT+TSPT showed consistently high performance, with statistically significant improvement in one task. Our findings demonstrate that RoBERTa and BERTweet are excellent off-the-shelf models for health-related social media text classification, and extended pretraining using SAPT and TSPT can further improve performance.

Download Full-text

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

Complex Networks and Their Applications VIII - Studies in Computational Intelligence ◽

10.1007/978-3-030-36687-2_77 ◽

2019 ◽

pp. 928-940 ◽

Cited By ~ 3

Author(s):

Marzieh Mozafari ◽

Reza Farahbakhsh ◽

Noël Crespi

Keyword(s):

Social Media ◽

Transfer Learning ◽

Hate Speech ◽

Learning Approach ◽

Speech Detection ◽

Online Social Media

Download Full-text

Text Classification Algorithms: A Survey

Information ◽

10.3390/info10040150 ◽

2019 ◽

Vol 10 (4) ◽

pp. 150 ◽

Cited By ~ 93

Author(s):

Kowsari ◽

Jafari Meimandi ◽

Heidarysafa ◽

Mendu ◽

Barnes ◽

...

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Language Processing ◽

Text Classification ◽

Classification Algorithms ◽

Learning Approaches ◽

Machine Learning Methods ◽

Linear Relationships ◽

Reduction Methods ◽

Complex Models

In recent years, there has been an exponential growth in the number of complex documentsand texts that require a deeper understanding of machine learning methods to be able to accuratelyclassify texts in many applications. Many machine learning approaches have achieved surpassingresults in natural language processing. The success of these learning algorithms relies on their capacityto understand complex models and non-linear relationships within data. However, finding suitablestructures, architectures, and techniques for text classification is a challenge for researchers. In thispaper, a brief overview of text classification algorithms is discussed. This overview covers differenttext feature extractions, dimensionality reduction methods, existing algorithms and techniques, andevaluations methods. Finally, the limitations of each technique and their application in real-worldproblems are discussed.

Download Full-text

Tweeting Grenfell: Discourse and networks in critical constructions of British Muslim social boundaries on social media

New Media & Society ◽

10.1177/1461444819864572 ◽

2019 ◽

Vol 22 (3) ◽

pp. 449-469 ◽

Cited By ~ 3

Author(s):

Joseph Downing ◽

Richard Dron

Keyword(s):

Social Media ◽

Social Network ◽

Thematic Analysis ◽

Hate Speech ◽

Social Boundaries ◽

Fake News ◽

International Space ◽

British Muslims ◽

Per Se ◽

Methodological Approaches

The Grenfell fire has yet to be analysed to understand the event’s implications in relation to construction of social boundaries for British Muslims. In this current research, two methodological approaches are applied to gain understandings of social boundary construction on twitter: thematic analysis of the content of tweets and social network analysis (SNA) of how messages are diffused and contested. Twitter is shown to be an important platform in spreading positive narratives about Muslims during the fire, enabling individuals to spontaneously contest fake news and hate narratives. Social media acts counter to established knowledge, demonstrating that it is not, per se, a conduit for fake news and hate speech. Furthermore, it demonstrates how twitter offers Muslims an international space to voice and articulate themselves where they can be influential in debates that effect Muslim diasporas in other national contexts.

Download Full-text