scholarly journals Transfer learning for Twitter sentiment analysis: Choosing an effective source dataset

2020 ◽  
Author(s):  
Eliseu Guimarães ◽  
Jonnathan Carvalho ◽  
Aline Paes ◽  
Alexandre Plastino

Sentiment analysis on social media data can be a challenging task, among other reasons, because labeled data for training is not always available. Transfer learning approaches address this problem by leveraging a labeled source domain to obtain a model for a target domain that is different but related to the source domain. However, the question that arises is how to choose proper source data for training the target classifier, which can be made considering the similarity between source and target data using distance metrics. This article investigates the relation between these distance metrics and the classifiers’ performance. For this purpose, we propose to evaluate four metrics combined with distinct dataset representations. Computational experiments, conducted in the Twitter sentiment analysis scenario, showed that the cosine similarity metric combined with bag-of-words normalized with term frequency-inverse document frequency presented the best results in terms of predictive power, outperforming even the classifiers trained with the target dataset in many cases.

Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 307
Author(s):  
Li Zhang ◽  
Haimeng Fan ◽  
Chengxia Peng ◽  
Guozheng Rao ◽  
Qing Cong

The widespread use of social media provides a large amount of data for public sentiment analysis. Based on social media data, researchers can study public opinions on human papillomavirus (HPV) vaccines on social media using machine learning-based approaches that will help us understand the reasons behind the low vaccine coverage. However, social media data is usually unannotated, and data annotation is costly. The lack of an abundant annotated dataset limits the application of deep learning methods in effectively training models. To tackle this problem, we propose three transfer learning approaches to analyze the public sentiment on HPV vaccines on Twitter. One was transferring static embeddings and embeddings from language models (ELMo) and then processing by bidirectional gated recurrent unit with attention (BiGRU-Att), called DWE-BiGRU-Att. The others were fine-tuning pre-trained models with limited annotated data, called fine-tuning generative pre-training (GPT) and fine-tuning bidirectional encoder representations from transformers (BERT). The fine-tuned GPT model was built on the pre-trained generative pre-training (GPT) model. The fine-tuned BERT model was constructed with BERT model. The experimental results on the HPV dataset demonstrated the efficacy of the three methods in the sentiment analysis of the HPV vaccination task. The experimental results on the HPV dataset demonstrated the efficacy of the methods in the sentiment analysis of the HPV vaccination task. The fine-tuned BERT model outperforms all other methods. It can help to find strategies to improve vaccine uptake.


Author(s):  
Jafar Tahmoresnezhad ◽  
Sattar Hashemi

One of the serious challenges in machine learning and pattern recognition is to transfer knowledge from related but different domains to a new unlabeled domain. Feature selection with maximum mean discrepancy (f-MMD) is a novel and effective approach to transfer knowledge from source domain (training set) into target domain (test set) where training and test sets are drawn from different distributions. However, f-MMD has serious challenges in facing datasets with large number of samples and features. Moreover, f-MMD ignores the feature-label relation in finding the reduced representation of dataset. In this paper, we exploit jointly transfer learning and class discrimination to cope with domain shift problem on which the distribution difference is considerably large. We therefore put forward a novel transfer learning and class discrimination approach, referred to as RandOm k-samplesets feature Weighting Approach (ROWA). Specifically, ROWA reduces the distribution difference across domains in an unsupervised manner where no label is available in the test set. Moreover, ROWA exploits feature-label relation to separate various classes alongside the domain transfer, and augments the relation of selected features and source domain labels. In this work, we employ disjoint/overlapping small-sized samplesets to iteratively converge to final solution. Employment of local sets along with a novel optimization problem constructs a robust and effective reduced representation for adaptation across domains. Extensive experiments on real and synthetic datasets verify that ROWA can significantly outperform state-of-the-art transfer learning approaches.


2021 ◽  
Author(s):  
Vadim Moshkin ◽  
Andrew Konstantinov ◽  
Nadezhda Yarushkina ◽  
Alexander Dyrnochkin

2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Jun He ◽  
Xiang Li ◽  
Yong Chen ◽  
Danfeng Chen ◽  
Jing Guo ◽  
...  

In mechanical fault diagnosis, it is impossible to collect massive labeled samples with the same distribution in real industry. Transfer learning, a promising method, is usually used to address the critical problem. However, as the number of samples increases, the interdomain distribution discrepancy measurement of the existing method has a higher computational complexity, which may make the generalization ability of the method worse. To solve the problem, we propose a deep transfer learning method based on 1D-CNN for rolling bearing fault diagnosis. First, 1-dimension convolutional neural network (1D-CNN), as the basic framework, is used to extract features from vibration signal. The CORrelation ALignment (CORAL) is employed to minimize marginal distribution discrepancy between the source domain and target domain. Then, the cross-entropy loss function and Adam optimizer are used to minimize the classification errors and the second-order statistics of feature distance between the source domain and target domain, respectively. Finally, based on the bearing datasets of Case Western Reserve University and Jiangnan University, seven transfer fault diagnosis comparison experiments are carried out. The results show that our method has better performance.


2020 ◽  
pp. 193-201 ◽  
Author(s):  
Hayder A. Alatabi ◽  
Ayad R. Abbas

Over the last period, social media achieved a widespread use worldwide where the statistics indicate that more than three billion people are on social media, leading to large quantities of data online. To analyze these large quantities of data, a special classification method known as sentiment analysis, is used. This paper presents a new sentiment analysis system based on machine learning techniques, which aims to create a process to extract the polarity from social media texts. By using machine learning techniques, sentiment analysis achieved a great success around the world. This paper investigates this topic and proposes a sentiment analysis system built on Bayesian Rough Decision Tree (BRDT) algorithm. The experimental results show the success of this system where the accuracy of the system is more than 95% on social media data.


Author(s):  
Kranti Vithal Ghag ◽  
Ketan Shah

<span>Bag-of-words approach is popularly used for Sentiment analysis. It maps the terms in the reviews to term-document vectors and thus disrupts the syntactic structure of sentences in the reviews. Association among the terms or the semantic structure of sentences is also not preserved. This research work focuses on classifying the sentiments by considering the syntactic and semantic structure of the sentences in the review. To improve accuracy, sentiment classifiers based on relative frequency, average frequency and term frequency inverse document frequency were proposed. To handle terms with apostrophe, preprocessing techniques were extended. To focus on opinionated contents, subjectivity extraction was performed at phrase level. Experiments were performed on Pang &amp; Lees, Kaggle’s and UCI’s dataset. Classifiers were also evaluated on the UCI’s Product and Restaurant dataset. Sentiment Classification accuracy improved from 67.9% for a comparable term weighing technique, DeltaTFIDF, up to 77.2% for proposed classifiers. Inception of the proposed concept based approach, subjectivity extraction and extensions to preprocessing techniques, improved the accuracy to 93.9%.</span>


Author(s):  
S. M. Mazharul Hoque Chowdhury ◽  
Sheikh Abujar ◽  
Ohidujjaman ◽  
Khalid Been Md. Badruzzaman ◽  
Syed Akhter Hossain

Author(s):  
Shalin Hai-Jew

Sentiment analysis has been used to assess people's feelings, attitudes, and beliefs, ranging from positive to negative, on a variety of phenomena. Several new autocoding features in NVivo 11 Plus enable the capturing of sentiment analysis and extraction of themes from text datasets. This chapter describes eight scenarios in which these tools may be applied to social media data, to (1) profile egos and entities, (2) analyze groups, (3) explore metadata for latent public conceptualizations, (4) examine trending public issues, (5) delve into public concepts, (6) observe public events, (7) analyze brand reputation, and (8) inspect text corpora for emergent insights.


Sign in / Sign up

Export Citation Format

Share Document