What is the Conversation About?

In Social Commerce customers evolve to be an important information source for companies. Customers use the communication platforms of Web 2.0, for example Twitter, in order to express their sentiments about products or discuss their experiences with them. These sentiments can be very important for the development of products or the enhancement of marketing strategies. The research goal is to analyze customer sentiments in Twitter. The first step in the research is the detection of topics in Twitter entries which contain patterns of interest. For the topic detection, the authors use Latent Dirichlet Allocation for topic modeling. The authors found event based topics in the exemplary context of Sony’s 3D TV sets. In future work, the authors will implement sentiment analysis algorithms in order to determine sentiments in the entries corresponding to the detected topics.

Download Full-text

Emerging Research Topic Detection Using Filtered-LDA

AI ◽

10.3390/ai2040035 ◽

2021 ◽

Vol 2 (4) ◽

pp. 578-599

Author(s):

Fuad Alattar ◽

Khaled Shaalan

Keyword(s):

Final Stage ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Research Topic ◽

Main Topic ◽

Topic Detection ◽

Scientific Papers ◽

Leveraging Text Mining Approach to Identify What People Want to Know About Mental Disorders from Online Inquiry Platforms (Preprint)

10.2196/preprints.32389 ◽

2021 ◽

Author(s):

Jin-Ah Sim ◽

Soowon Park

Keyword(s):

Mental Disorders ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Stressful Life Events ◽

Social Stigma ◽

Information Source ◽

Educational Programs ◽

General Information ◽

Social Adaptation ◽

Word Frequencies

BACKGROUND Online inquiry platforms, which is where a person can anonymously ask questions, have become an important information source for those who are concerned about social stigma and discrimination that follow mental disorders. Therefore, examining what people inquire about regarding mental disorders would be useful when designing educational programs for communities. OBJECTIVE The present study aimed to examine the contents of the queries regarding mental disorders that were posted on online inquiry platforms. METHODS A total of 4,714 relevant queries from the two major online inquiry platforms were collected. We computed word frequencies, centralities, and latent Dirichlet allocation (LDA) topic modeling. RESULTS The words like symptom, hospital and treatment ranked as the most frequently used words, and the word my appeared to have the highest centrality. Results: Four topics exist according to the LDA, which are 1) understanding general symptoms, 2) disability grading system and welfare entitlement, 3) stressful life events, and (4) social adaptation with mental disorders. CONCLUSIONS People are interested in practical information concerning mental disorders, such as social benefits, social adaptation, and more general information about the symptoms and the treatments. Our findings suggest that instructions encompassing different scopes of information are needed when developing educational programs.

Download Full-text

Leveraging Text Mining Approach to Identify What People Want to Know About Mental Disorders From Online Inquiry Platforms

Frontiers in Public Health ◽

10.3389/fpubh.2021.759802 ◽

2021 ◽

Vol 9 ◽

Author(s):

Soowon Park ◽

Yaeji Kim-Knauss ◽

Jin-ah Sim

Keyword(s):

Mental Disorders ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Stressful Life Events ◽

Social Stigma ◽

Information Source ◽

Educational Programs ◽

General Information ◽

Social Adaptation ◽

Word Frequencies

Online inquiry platforms, which is where a person can anonymously ask questions, have become an important information source for those who are concerned about social stigma and discrimination that follow mental disorders. Therefore, examining what people inquire about regarding mental disorders would be useful when designing educational programs for communities. The present study aimed to examine the contents of the queries regarding mental disorders that were posted on online inquiry platforms. A total of 4,714 relevant queries from the two major online inquiry platforms were collected. We computed word frequencies, centralities, and latent Dirichlet allocation (LDA) topic modeling. The words like symptom, hospital and treatment ranked as the most frequently used words, and the word my appeared to have the highest centrality. LDA identified four latent topics: (1) the understanding of general symptoms, (2) a disability grading system and welfare entitlement, (3) stressful life events, and (4) social adaptation with mental disorders. People are interested in practical information concerning mental disorders, such as social benefits, social adaptation, more general information about the symptoms and the treatments. Our findings suggest that instructions encompassing different scopes of information are needed when developing educational programs.

Download Full-text

Topic Modeling for Amharic User Generated Texts

Information ◽

10.3390/info12100401 ◽

2021 ◽

Vol 12 (10) ◽

pp. 401

Author(s):

Girma Neshir ◽

Andreas Rauber ◽

Solomon Atnafu

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Topic Model ◽

Sampling Technique ◽

Neural Nets ◽

Supervised Machine Learning ◽

Support Vector ◽

Topic Detection ◽

Learning Tools ◽

Statistical Process

Topic Modeling is a statistical process, which derives the latent themes from extensive collections of text. Three approaches to topic modeling exist, namely, unsupervised, semi-supervised and supervised. In this work, we develop a supervised topic model for an Amharic corpus. We also investigate the effect of stemming on topic detection on Term Frequency Inverse Document Frequency (TF-IDF) features, Latent Dirichlet Allocation (LDA) features and a combination of these two feature sets using four supervised machine learning tools, that is, Support Vector Machine (SVM), Naive Bayesian (NB), Logistic Regression (LR), and Neural Nets (NN). We evaluate our approach using an Amharic corpus of 14,751 documents of ten topic categories. Both qualitative and quantitative analysis of results show that our proposed supervised topic detection outperforms with an accuracy of 88% by SVM using state-of-the-art-approach TF-IDF word features with the application of the Synthetic Minority Over-sampling Technique (SMOTE) and with no stemming operation. The results show that text features with stemming slightly improve the performance of the topic classifier over features with no stemming.

Download Full-text

Enhanced context-aware recommendation using topic modeling and particle swarm optimization

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210331 ◽

2021 ◽

pp. 1-16

Author(s):

Ibtissem Gasmi ◽

Mohamed Walid Azizi ◽

Hassina Seridi-Bouchelaghem ◽

Nabiha Azizi ◽

Samir Brahim Belhaouari

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

State Of The Art ◽

Weighting Function ◽

Contextual Factors ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Pso Algorithm ◽

Context Aware ◽

Proposed Model

Context-Aware Recommender System (CARS) suggests more relevant services by adapting them to the user’s specific context situation. Nevertheless, the use of many contextual factors can increase data sparsity while few context parameters fail to introduce the contextual effects in recommendations. Moreover, several CARSs are based on similarity algorithms, such as cosine and Pearson correlation coefficients. These methods are not very effective in the sparse datasets. This paper presents a context-aware model to integrate contextual factors into prediction process when there are insufficient co-rated items. The proposed algorithm uses Latent Dirichlet Allocation (LDA) to learn the latent interests of users from the textual descriptions of items. Then, it integrates both the explicit contextual factors and their degree of importance in the prediction process by introducing a weighting function. Indeed, the PSO algorithm is employed to learn and optimize weights of these features. The results on the Movielens 1 M dataset show that the proposed model can achieve an F-measure of 45.51% with precision as 68.64%. Furthermore, the enhancement in MAE and RMSE can respectively reach 41.63% and 39.69% compared with the state-of-the-art techniques.

Download Full-text

Innovation in an Emerging Market: A Bibliometric and Latent Dirichlet Allocation Based Topic Modeling Study

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317278 ◽

2020 ◽

Author(s):

Mohd Faiz Hilmi ◽

Yanti Mustapha ◽

Mohammad Tasyriq Che Omar

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Emerging Market ◽

Modeling Study ◽

Dirichlet Allocation

Download Full-text

Mining Open Government Data for Business Intelligence Using Data Visualization: A Two-Industry Case Study

Journal of theoretical and applied electronic commerce research ◽

10.3390/jtaer16040059 ◽

2021 ◽

Vol 16 (4) ◽

pp. 1042-1065

Author(s):

Anne Gottfried ◽

Caroline Hartmann ◽

Donald Yates

Keyword(s):

Data Visualization ◽

Business Intelligence ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Open Government ◽

Open Government Data ◽

Market Opportunities ◽

Government Data ◽

Source Of Information

The business intelligence (BI) market has grown at a tremendous rate in the past decade due to technological advancements, big data and the availability of open source content. Despite this growth, the use of open government data (OGD) as a source of information is very limited among the private sector due to a lack of knowledge as to its benefits. Scant evidence on the use of OGD by private organizations suggests that it can lead to the creation of innovative ideas as well as assist in making better informed decisions. Given the benefits but lack of use of OGD to generate business intelligence, we extend research in this area by exploring how OGD can be used to generate business intelligence for the identification of market opportunities and strategy formulation; an area of research that is still in its infancy. Using a two-industry case study approach (footwear and lumber), we use latent Dirichlet allocation (LDA) topic modeling to extract emerging topics in these two industries from OGD, and a data visualization tool (pyLDAVis) to visualize the topics in order to interpret and transform the data into business intelligence. Additionally, we perform an environmental scanning of the environment for the two industries to validate the usability of the information obtained. The results provide evidence that OGD can be a valuable source of information for generating business intelligence and demonstrate how topic modeling and visualization tools can assist organizations in extracting and analyzing information for the identification of market opportunities.

Download Full-text

Exposing Emerging Trends in Smart Sustainable City Research Using Deep Autoencoders-Based Fuzzy C-Means

Sustainability ◽

10.3390/su13052876 ◽

2021 ◽

Vol 13 (5) ◽

pp. 2876

Author(s):

Anne Parlina ◽

Kalamullah Ramli ◽

Hendri Murfi

Keyword(s):

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Smart Cities ◽

Detection Methods ◽

Sustainable City ◽

Fuzzy C Means ◽

Progressive Development ◽

Emerging Trends ◽

Energy Environment ◽

Non Negative Matrix Factorization

The literature discussing the concepts, technologies, and ICT-based urban innovation approaches of smart cities has been growing, along with initiatives from cities all over the world that are competing to improve their services and become smart and sustainable. However, current studies that provide a comprehensive understanding and reveal smart and sustainable city research trends and characteristics are still lacking. Meanwhile, policymakers and practitioners alike need to pursue progressive development. In response to this shortcoming, this research offers content analysis studies based on topic modeling approaches to capture the evolution and characteristics of topics in the scientific literature on smart and sustainable city research. More importantly, a novel topic-detecting algorithm based on the deep learning and clustering techniques, namely deep autoencoders-based fuzzy C-means (DFCM), is introduced for analyzing the research topic trend. The topics generated by this proposed algorithm have relatively higher coherence values than those generated by previously used topic detection methods, namely non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), and eigenspace-based fuzzy C-means (EFCM). The 30 main topics that appeared in topic modeling with the DFCM algorithm were classified into six groups (technology, energy, environment, transportation, e-governance, and human capital and welfare) that characterize the six dimensions of smart, sustainable city research.

Download Full-text

Analysis and Visualization Latent Topic on COVID-19 Vaccine Tweet use two-stage topic modeling (Preprint)

10.2196/preprints.30290 ◽

2021 ◽

Author(s):

Faizah Faizah ◽

Bor-Shen Lin

Keyword(s):

Topic Modeling ◽

Public Perception ◽

Latent Dirichlet Allocation ◽

World Health ◽

Two Stage ◽

The Public ◽

Global Pandemic ◽

Difficult Time ◽

Latent Topic ◽

Latent Topics

BACKGROUND The World Health Organization (WHO) declared COVID-19 as a global pandemic on January 30, 2020. However, the pandemic has not been over yet. Furthermore, in the first quartal of 2021, some countries face the third wave of the pandemic. During the difficult time, the development of the vaccines for COVID-19 accelerates rapidly. Understanding the public perception of the COVID-19 Vaccine according to the data collected from social media can widen the perspective on the state of the global pandemic OBJECTIVE This study explores and analyzes the latent topic on COVID-19 Vaccine Tweet posted by individuals from various countries by using two-stage topic modeling. METHODS A two-stage analysis in topic modeling was proposed to investigating people’s reactions in five countries. The first stage is Latent Dirichlet Allocation that produces the latent topics with the corresponding term distributions that facilitate the investigators to understand the main issues or opinions. The second stage then performs agglomerative clustering on the latent topics based on Hellinger distance, which merges close topics hierarchically into topic clusters to visualize those topics in either tree or graph views. RESULTS In general, the topic discussion regarding the COVID-19 Vaccine in five countries is similar. Topic themes such as "first vaccine" and & "vaccine effect" dominate the public discussion. The remarkable point is that people in some countries have some topic themes, such as "politician opinion" and " stay home" in Canada, "emergency" in India, and & "blood clots" in the United Kingdom. The analysis also shows the most popular COVID-19 Vaccine, which is gaining more public interest. CONCLUSIONS With LDA and Hierarchical clustering, two-stage topic modeling is powerful for visualizing the latent topics and understanding the public perception regarding the COVID-19 Vaccine.

Download Full-text

How is People's Awareness of “Biodiversity” Measured ?Using Sentiment Analysis and LDA Topic Modeling in the Twitter Discourse Space from 2010 to 2020

10.21203/rs.3.rs-922908/v1 ◽

2021 ◽

Author(s):

Shimon Ohtani

Keyword(s):

Sentiment Analysis ◽

Topic Modeling ◽

Data Science ◽

Latent Dirichlet Allocation ◽

Biological Diversity ◽

Public Awareness ◽

Convention On Biological Diversity ◽

Emotion Lexicon ◽

Aichi Biodiversity Targets ◽

Do So

Abstract The importance of biodiversity conservation is gradually being recognized worldwide, and 2020 was the final year of the Aichi Biodiversity Targets formulated at the 10th Conference of the Parties to the Convention on Biological Diversity (COP10) in 2010. Unfortunately, the majority of the targets were assessed as unachievable. While it is essential to measure public awareness of biodiversity when setting the post-2020 targets, it is also a difficult task to propose a method to do so. This study provides a diachronic exploration of the discourse on “biodiversity” from 2010 to 2020, using Twitter posts, in combination with sentiment analysis and topic modeling, which are commonly used in data science. Through the aggregation and comparison of n-grams, the visualization of eight types of emotional tendencies using the NRC emotion lexicon, the construction of topic models using Latent Dirichlet allocation (LDA), and the qualitative analysis of tweet texts based on these models, I was able to classify and analyze unstructured tweets in a meaningful way. The results revealed the evolution of words used with “biodiversity” on Twitter over the past decade, the emotional tendencies behind the contexts in which “biodiversity” has been used, and the approximate content of tweet texts that have constituted topics with distinctive characteristics. While the search for people's awareness through SNS analysis still has many limitations, it is undeniable that important suggestions can be obtained. In order to further refine the research method, it will be essential to improve the skills of analysts and accumulate research examples as well as to advance data science.

Download Full-text