Topic Modeling as a Tool to Gauge Political Sentiments from Twitter Feeds

Debabrata Sarddar; Raktim Kumar Dey; Rajesh Bose; Sandip Roy

doi:10.4018/ijncr.2020040102

Enhanced context-aware recommendation using topic modeling and particle swarm optimization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210331 ◽

2021 ◽

pp. 1-16

Author(s):

Ibtissem Gasmi ◽

Mohamed Walid Azizi ◽

Hassina Seridi-Bouchelaghem ◽

Nabiha Azizi ◽

Samir Brahim Belhaouari

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Weighting Function ◽

Contextual Factors ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Pso Algorithm ◽

Context Aware ◽

Proposed Model

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.

How is People's Awareness of “Biodiversity” Measured ?Using Sentiment Analysis and LDA Topic Modeling in the Twitter Discourse Space from 2010 to 2020

10.21203/rs.3.rs-922908/v1 ◽

2021 ◽

Author(s):

Shimon Ohtani

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Data Science ◽

Latent Dirichlet Allocation ◽

Biological Diversity ◽

Public Awareness ◽

Convention On Biological Diversity ◽

Emotion Lexicon ◽

Aichi Biodiversity Targets ◽

Do So

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.

Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts

The Computer Journal ◽

10.1093/comjnl/bxaa079 ◽

2020 ◽

Author(s):

Kai Zhang ◽

Yuan Zhou ◽

Zheng Chen ◽

Yufei Liu ◽

Zhuo Tang ◽

...

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Semantic Knowledge ◽

Superior Performance ◽

Knowledge Based ◽

Modeling Process ◽

Proposed Model ◽

Benchmark Datasets ◽

Latent Topic

Abstract The prevalence of short texts on the Web has made mining the latent topic structures of short texts a critical and fundamental task for many applications. However, due to the lack of word co-occurrence information induced by the content sparsity of short texts, it is challenging for traditional topic models like latent Dirichlet allocation (LDA) to extract coherent topic structures on short texts. Incorporating external semantic knowledge into the topic modeling process is an effective strategy to improve the coherence of inferred topics. In this paper, we develop a novel topic model—called biterm correlation knowledge-based topic model (BCK-TM)—to infer latent topics from short texts. Specifically, the proposed model mines biterm correlation knowledge automatically based on recent progress in word embedding, which can represent semantic information of words in a continuous vector space. To incorporate external knowledge, a knowledge incorporation mechanism is designed over the latent topic layer to regularize the topic assignment of each biterm during the topic sampling process. Experimental results on three public benchmark datasets illustrate the superior performance of the proposed approach over several state-of-the-art baseline models.

Sentiment Analysis and Topic Modeling of Indonesian Public Conversation about COVID-19 Epidemics on Twitter

IJID (International Journal on Informatics for Development) ◽

10.14421/ijid.2021.2400 ◽

2021 ◽

Vol 10 (1) ◽

pp. 23-30

Author(s):

Muhammad Habibi ◽

Adri Priadana ◽

Muhammad Rifqi Ma’arif

Keyword(s):

Public Health ◽

Social Media ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

World Health ◽

The Public ◽

Response Strategies ◽

Existing Problems ◽

The World ◽

Health Organization

The World Health Organization (WHO) declared the COVID-19 outbreak has resulted in more than six million confirmed cases and more than 371,000 deaths globally on June 1, 2020. The incident sparked a flood of scientific research to help society deal with the virus, both inside and outside the medical domain. Research related to public health analysis and public conversations about the spread of COVID-19 on social media is one of the highlights of researchers in the world. People can analyze information from social media as supporting data about public health. Analyzing public conversations will help the relevant authorities understand public opinion and information gaps between them and the public, helping them develop appropriate emergency response strategies to address existing problems in the community during the pandemic and provide information on the population's emotions in different contexts. However, research related to the analysis of public health and public conversations was so far conducted only through supervised analysis of textual data. In this study, we aim to analyze specifically the sentiment and topic modeling of Indonesian public conversations about the COVID-19 on Twitter using the NLP technique. We applied some methods to analyze the sentiment to obtain the best classification method. In this study, the topic modeling was carried out unsupervised using Latent Dirichlet Allocation (LDA). The results of this study reveal that the most frequently discussed topic related to the COVID-19 pandemic is economic issues.

A novel model for analyzing online customer experience in hotel services approach by topic modeling

Science & Technology Development Journal - Economics - Law and Management ◽

10.32508/stdjelm.v4i3.656 ◽

2020 ◽

Vol 4 (3) ◽

pp. First

Author(s):

Ho Van Nguyen ◽

Ho Trung Thanh

Keyword(s):

Decision Making ◽

Topic Modeling ◽

Hospitality Industry ◽

Latent Dirichlet Allocation ◽

The Internet ◽

Coherence Score ◽

Management And Development ◽

Hospitality Sector ◽

Novel Model ◽

Support Decision Making

Recently, with the growth of technology and the Internet, customers can easily give their opinions and feedback about products and services on websites or social media. This information is stored in text form, and is a huge source of data to explore. In order to continue developing to meet customers needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 99,322 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After pre-processing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) was chosen by Perplexity and Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus was used based on the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with a corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hospitality sector.

Pemodelan topik pada dokumen paten terkait pupuk di Indonesia berbasis Latent Dirichlet Allocation

Berkala Ilmu Perpustakaan dan Informasi ◽

10.22146/bip.v17i2.2147 ◽

2021 ◽

Vol 17 (2) ◽

pp. 168-180

Author(s):

Aris Yaman ◽

Bagus Sartono ◽

Agus M. Soleh

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Patent Document ◽

Text Data ◽

Production Factors ◽

The World ◽

Technological Developments ◽

Collection Methods ◽

Fertilizer Technology ◽

Better Than

Introduction. Fertilizer is one of the most important production factors in the world of agriculture. It is crucial to increase the capacity of technology related to fertilizers. Analysis of patent documents can be one way to analyze technological developments, especially fertilizers. Data Collection Methods. The data used in this research are metadata, especially the title and abstract of a patent document in Indonesia. With the keyword "fertilizer," Patent metadata was processed in the 1945-2017 period. Data Analysis. The LDA model can provide a reasonable interpretation regarding topic modeling based on text data. Results and Discussion. The results find that degree of the patent title is better than the abstract of the patent. The LDA approach can adequately separate the topics of fertilizer patent technology so that it does not have multiple interpretations. Conclusion. Based on the findings, there are nine essential topics in the development of fertilizer technology. There is a phenomenon of the lack of technology collaboration between IPC technology sections.

A Latent Allocation Model for Brand Awareness and Mindset Metrics

International Journal of Market Research ◽

10.1177/14707853211040052 ◽

2021 ◽

pp. 147078532110400

Author(s):

Pablo Marshall

Keyword(s):

Personality Traits ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Brand Image ◽

Brand Awareness ◽

Bag Of Words ◽

Attribute Importance ◽

Allocation Model ◽

Proposed Model ◽

Dirichlet Allocation

Mindset metrics, the measurement of consumers’ perceptions, attitudes, and intentions, have a long tradition in marketing, particularly in advertising and branding. Some of the most usual mindset metrics are brand awareness, brand image, personality traits, and attribute importance. Brand awareness and other mindset measures have the form of texts (bag of words). And, a natural methodology for analyzing these variables is topic modeling and the popular Latent Dirichlet allocation (LDA) model. The LDA methodology assumes that brands or concepts are represented by clusters of brands in consumers’ minds. This study proposes an extension/modification of the LDA model for brand awareness and other mindset variables that incorporate Bernoulli observations instead of the Multinomial specification present in the usual LDA specification. This extension is relevant since, unlike words in texts, brands and mindset concepts are not repeated within a document and have a dichotomous form, present or absent. The proposed model is applied to two brand awareness datasets. The results show significant gains in both managerial insights in analyzing brand clusters and consumers’ profiles.

How is People's Awareness of “Biodiversity” Measured ?Using Sentiment Analysis and LDA Topic Modeling in the Twitter Discourse Space from 2010 to 2020

10.21203/rs.3.rs-922908/v2 ◽

2021 ◽

Author(s):

Shimon Ohtani

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Data Science ◽

Latent Dirichlet Allocation ◽

Biological Diversity ◽

Public Awareness ◽

Convention On Biological Diversity ◽

Emotion Lexicon ◽

Aichi Biodiversity Targets ◽

Do So

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.

Topic modeling for analyzing online reviews in hotel sector

Science & Technology Development Journal - Economics - Law and Management ◽

10.32508/stdjelm.v4i4.692 ◽

2020 ◽

Vol 4 (4) ◽

pp. First

Author(s):

Nguyen Van Ho ◽

Ho Trung Thanh

Keyword(s):

Topic Modeling ◽

Hospitality Industry ◽

Latent Dirichlet Allocation ◽

Online Reviews ◽

The Internet ◽

Coherence Score ◽

Management And Development ◽

Hotel Sector ◽

Textual Form ◽

Support Decision Making

Recently, with the growth of technology and the Internet, customers can easily create their opinions and feedbacks about products and services of hotels on websites or social media. This information is stored in textual form, and is a huge source of data to explore. In order to continue developing to meet customers' needs, businesses need to gain customers' insights that customers discuss and concern. In this study, we firstly collected a corpus of 26,482 customer comments and reviews written in English from some e-commerce websites in the hospitality industry. After preprocessing the collected data, our team conducted experiments on this corpus and chose the best number of topics (K) by Coherence Score measurements as input parameters for the model. Finally, experiment on the corpus according to the Latent Dirichlet Allocation (LDA) model with K coefficient to explore the topic. The model results found hidden topics with the corresponding list of keywords, reflecting the issues that customers are interested in. Applying empirical results from the model will support decision making to improve products and services in business as well as in the management and development of businesses in the hotel sector.

A hitchhiker’s guide to microscopy and microanalysis using telecommunications, e-mail, and the Internet

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100169687 ◽

1994 ◽

Vol 52 ◽

pp. 390-391 ◽

Cited By ~ 1

Author(s):

Nestor J. Zaluzec

Keyword(s):

The Internet ◽

Information Superhighway ◽

The World ◽

Exchange Information ◽

On Line ◽

Computer Based ◽

Software And Hardware ◽

Communication Software ◽

Basic Ideas ◽

E Mail

The Information SuperHighway, Email, The Internet, FTP, BBS, Modems, : all buzz words which are becoming more and more routine in our daily life. Confusing terminology? Hopefully it won't be in a few minutes, all you need is to have a handle on a few basic concepts and terms and you will be on-line with the rest of the "telecommunication experts". These terms all refer to some type or aspect of tools associated with a range of computer-based communication software and hardware. They are in fact far less complex than the instruments we use on a day to day basis as microscopist's and microanalyst's. The key is for each of us to know what each is and how to make use of the wealth of information which they can make available to us for the asking. Basically all of these items relate to mechanisms and protocols by which we as scientists can easily exchange information rapidly and efficiently to colleagues in the office down the hall, or half-way around the world using computers and various communications media. The purpose of this tutorial/paper is to outline and demonstrate the basic ideas of some of the major information systems available to all of us today. For the sake of simplicity we will break this presentation down into two distinct (but as we shall see later connected) areas: telecommunications over conventional phone lines, and telecommunications by computer networks. Live tutorial/demonstrations of both procedures will be presented in the Computer Workshop/Software Exchange during the course of the meeting.