Topic Modeling as a Tool to Gauge Political Sentiments from Twitter Feeds

2020 ◽  
Vol 9 (2) ◽  
pp. 14-35
Author(s):  
Debabrata Sarddar ◽  
Raktim Kumar Dey ◽  
Rajesh Bose ◽  
Sandip Roy

As ubiquitous as it is, the Internet has spawned a slew of products that have forever changed the way one thinks of society and politics. This article proposes a model to predict chances of a political party winning based on data collected from Twitter microblogging website, because it is the most popular microblogging platform in the world. Using unsupervised topic modeling and the NRC Emotion Lexicon, the authors demonstrate how it is possible to predict results by analyzing eight types of emotions expressed by users on Twitter. To prove the results based on empirical analysis, the authors examine the Twitter messages posted during 14th Gujarat Legislative Assembly election, 2017. Implementing two unsupervised clustering methods of K-means and Latent Dirichlet Allocation, this research shows how the proposed model is able to examine and summarize observations based on underlying semantic structures of messages posted on Twitter. These two well-known unsupervised clustering methods provide a firm base for the proposed model to enable streamlining of decision-making processes objectively.

2021 ◽  
pp. 1-16
Author(s):  
Ibtissem Gasmi ◽  
Mohamed Walid Azizi ◽  
Hassina Seridi-Bouchelaghem ◽  
Nabiha Azizi ◽  
Samir Brahim Belhaouari

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.


2021 ◽  
Author(s):  
Shimon Ohtani

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.


2020 ◽  
Author(s):  
Kai Zhang ◽  
Yuan Zhou ◽  
Zheng Chen ◽  
Yufei Liu ◽  
Zhuo Tang ◽  
...  

Abstract The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.


2021 ◽  
Vol 10 (1) ◽  
pp. 23-30
Author(s):  
Muhammad Habibi ◽  
Adri Priadana ◽  
Muhammad Rifqi Ma’arif

The World Health Organization (WHO) declared the COVID-19 outbreak has resulted in more than six million confirmed cases and more than 371,000 deaths globally on June 1, 2020. The incident sparked a flood of scientific research to help society deal with the virus, both inside and outside the medical domain. Research related to public health analysis and public conversations about the spread of COVID-19 on social media is one of the highlights of researchers in the world. People can analyze information from social media as supporting data about public health. Analyzing public conversations will help the relevant authorities understand public opinion and information gaps between them and the public, helping them develop appropriate emergency response strategies to address existing problems in the community during the pandemic and provide information on the population's emotions in different contexts. However, research related to the analysis of public health and public conversations was so far conducted only through supervised analysis of textual data. In this study, we aim to analyze specifically the sentiment and topic modeling of Indonesian public conversations about the COVID-19 on Twitter using the NLP technique. We applied some methods to analyze the sentiment to obtain the best classification method. In this study, the topic modeling was carried out unsupervised using Latent Dirichlet Allocation (LDA). The results of this study reveal that the most frequently discussed topic related to the COVID-19 pandemic is economic issues.


Author(s):  
Ho Van Nguyen ◽  
Ho Trung Thanh

Recently, with the growth of technology and the Internet, customers can easily give their opinions and feedback about products and services on websites or social media. This information is stored in text form, and is a huge source of data to explore. In order to continue developing to meet customers needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 99,322 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After pre-processing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) was chosen by Perplexity and Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus was used based on the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with a corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hospitality sector.


2021 ◽  
Vol 17 (2) ◽  
pp. 168-180
Author(s):  
Aris Yaman ◽  
Bagus Sartono ◽  
Agus M. Soleh

Introduction. Fertilizer is one of the most important production factors in the world of agriculture. It is crucial to increase the capacity of technology related to fertilizers. Analysis of patent documents can be one way to analyze technological developments, especially fertilizers. Data Collection Methods. The data used in this research are metadata, especially the title and abstract of a patent document in Indonesia. With the keyword "fertilizer," Patent metadata was processed in the 1945-2017 period. Data Analysis. The LDA model can provide a reasonable interpretation regarding topic modeling based on text data. Results and Discussion. The results find that degree of the patent title is better than the abstract of the patent. The LDA approach can adequately separate the topics of fertilizer patent technology so that it does not have multiple interpretations. Conclusion. Based on the findings, there are nine essential topics in the development of fertilizer technology. There is a phenomenon of the lack of technology collaboration between IPC technology sections.


2021 ◽  
pp. 147078532110400
Author(s):  
Pablo Marshall

Mindset metrics, the measurement of consumers’ perceptions, attitudes, and intentions, have a long tradition in marketing, particularly in advertising and branding. Some of the most usual mindset metrics are brand awareness, brand image, personality traits, and attribute importance. Brand awareness and other mindset measures have the form of texts (bag of words). And, a natural methodology for analyzing these variables is topic modeling and the popular Latent Dirichlet allocation (LDA) model. The LDA methodology assumes that brands or concepts are represented by clusters of brands in consumers’ minds. This study proposes an extension/modification of the LDA model for brand awareness and other mindset variables that incorporate Bernoulli observations instead of the Multinomial specification present in the usual LDA specification. This extension is relevant since, unlike words in texts, brands and mindset concepts are not repeated within a document and have a dichotomous form, present or absent. The proposed model is applied to two brand awareness datasets. The results show significant gains in both managerial insights in analyzing brand clusters and consumers’ profiles.


2021 ◽  
Author(s):  
Shimon Ohtani

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.


Author(s):  
Nguyen Van Ho ◽  
Ho Trung Thanh

Recently, with the growth of technology and the Internet, customers can easily create their opinions and feedbacks about products and services of hotels on websites or social media. This information is stored in textual form, and is a huge source of data to explore. In order to continue developing to meet customers' needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 26,482 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After preprocessing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) by Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus according to the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with the corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hotel sector.


Author(s):  
Nestor J. Zaluzec

The Information SuperHighway, Email, The Internet, FTP, BBS, Modems, : all buzz words which are becoming more and more routine in our daily life. Confusing terminology? Hopefully it won't be in a few minutes, all you need is to have a handle on a few basic concepts and terms and you will be on-line with the rest of the "telecommunication experts". These terms all refer to some type or aspect of tools associated with a range of computer-based communication software and hardware. They are in fact far less complex than the instruments we use on a day to day basis as microscopist's and microanalyst's. The key is for each of us to know what each is and how to make use of the wealth of information which they can make available to us for the asking. Basically all of these items relate to mechanisms and protocols by which we as scientists can easily exchange information rapidly and efficiently to colleagues in the office down the hall, or half-way around the world using computers and various communications media. The purpose of this tutorial/paper is to outline and demonstrate the basic ideas of some of the major information systems available to all of us today. For the sake of simplicity we will break this presentation down into two distinct (but as we shall see later connected) areas: telecommunications over conventional phone lines, and telecommunications by computer networks. Live tutorial/demonstrations of both procedures will be presented in the Computer Workshop/Software Exchange during the course of the meeting.


Sign in / Sign up

Export Citation Format

Share Document