Extracting Hierarchy of Coherent User-Concerns to Discover Intricate User Behavior from User Reviews

Author(s):  
Ligaj Pradhan ◽  
Chengcui Zhang ◽  
Steven Bethard

Intricate user-behaviors can be understood by discovering user interests from their reviews. Topic modeling techniques have been extensively explored to discover latent user interests from user reviews. However, a topic extracted by topic modelling techniques can be a mixture of several quite different concepts and thus less interpretable. In this paper, the authors present a method that uses topic modeling techniques to discover a large number of topics and applies hierarchical clustering to generate a much smaller number of interpretable User-Concerns. These User-Concerns are further compared with topics generated by Latent Dirichlet Allocation (LDA) and Pachinko Allocation Model (PAM) and shown to be more coherent and interpretable. The authors cut the linkage tree formed while performing the hierarchical clustering of the User-Concerns, at different levels, and generate a hierarchy of User-Concerns. They also discuss how collaborative filtering based recommendation systems can be enriched by infusing additional user-behavioral knowledge from such hierarchy.

2021 ◽  
Vol 26 (6) ◽  
Author(s):  
Camila Costa Silva ◽  
Matthias Galster ◽  
Fabian Gilson

AbstractTopic modeling using models such as Latent Dirichlet Allocation (LDA) is a text mining technique to extract human-readable semantic “topics” (i.e., word clusters) from a corpus of textual documents. In software engineering, topic modeling has been used to analyze textual data in empirical studies (e.g., to find out what developers talk about online), but also to build new techniques to support software engineering tasks (e.g., to support source code comprehension). Topic modeling needs to be applied carefully (e.g., depending on the type of textual data analyzed and modeling parameters). Our study aims at describing how topic modeling has been applied in software engineering research with a focus on four aspects: (1) which topic models and modeling techniques have been applied, (2) which textual inputs have been used for topic modeling, (3) how textual data was “prepared” (i.e., pre-processed) for topic modeling, and (4) how generated topics (i.e., word clusters) were named to give them a human-understandable meaning. We analyzed topic modeling as applied in 111 papers from ten highly-ranked software engineering venues (five journals and five conferences) published between 2009 and 2020. We found that (1) LDA and LDA-based techniques are the most frequent topic modeling techniques, (2) developer communication and bug reports have been modelled most, (3) data pre-processing and modeling parameters vary quite a bit and are often vaguely reported, and (4) manual topic naming (such as deducting names based on frequent words in a topic) is common.


2021 ◽  
pp. 147078532110400
Author(s):  
Pablo Marshall

Mindset metrics, the measurement of consumers’ perceptions, attitudes, and intentions, have a long tradition in marketing, particularly in advertising and branding. Some of the most usual mindset metrics are brand awareness, brand image, personality traits, and attribute importance. Brand awareness and other mindset measures have the form of texts (bag of words). And, a natural methodology for analyzing these variables is topic modeling and the popular Latent Dirichlet allocation (LDA) model. The LDA methodology assumes that brands or concepts are represented by clusters of brands in consumers’ minds. This study proposes an extension/modification of the LDA model for brand awareness and other mindset variables that incorporate Bernoulli observations instead of the Multinomial specification present in the usual LDA specification. This extension is relevant since, unlike words in texts, brands and mindset concepts are not repeated within a document and have a dichotomous form, present or absent. The proposed model is applied to two brand awareness datasets. The results show significant gains in both managerial insights in analyzing brand clusters and consumers’ profiles.


2019 ◽  
Vol 46 (1) ◽  
pp. 23-40 ◽  
Author(s):  
Yezheng Liu ◽  
Fei Du ◽  
Jianshan Sun ◽  
Yuanchun Jiang

User-generated content has been an increasingly important data source for analysing user interests in both industries and academic research. Since the proposal of the basic latent Dirichlet allocation (LDA) model, plenty of LDA variants have been developed to learn knowledge from unstructured user-generated contents. An intractable limitation for LDA and its variants is that low-quality topics whose meanings are confusing may be generated. To handle this problem, this article proposes an interactive strategy to generate high-quality topics with clear meanings by integrating subjective knowledge derived from human experts and objective knowledge learned by LDA. The proposed interactive latent Dirichlet allocation (iLDA) model develops deterministic and stochastic approaches to obtain subjective topic-word distribution from human experts, combines the subjective and objective topic-word distributions by a linear weighted-sum method, and provides the inference process to draw topics and words from a comprehensive topic-word distribution. The proposed model is a significant effort to integrate human knowledge with LDA-based models by interactive strategy. The experiments on two real-world corpora show that the proposed iLDA model can draw high-quality topics with the assistance of subjective knowledge from human experts. It is robust under various conditions and offers fundamental supports for the applications of LDA-based topic modelling.


2021 ◽  
Author(s):  
Daiwei Zhang ◽  
Yue Liu ◽  
Senqi Zhang ◽  
Li Sun ◽  
Pin Li ◽  
...  

AbstractBackgroundAmid the COVID-19 pandemic, mental health-related symptoms (such as depression and anxiety) have been actively mentioned on social media.ObjectiveIn this study, we aimed to monitor mental health concerns on Twitter during the COVID-19 pandemic in the United Kingdom (UK), and assess the potential impact of the COVID-19 pandemic on mental health concerns of Twitter users.MethodsWe collected COVID-19 and mental health-related tweets from the UK between March 5, 2020 and January 31, 2021 through the Twitter Streaming API. We conducted topic modeling using Latent Dirichlet Allocation model to examine discussions about mental health concerns. Deep learning algorithms including Face++ were used to infer the demographic characteristics (age and gender) of Twitter users who expressed mental health concerns related to the COVID-19 pandemic.ResultsWe showed a positive correlation between COVID-19-related mental health concerns on Twitter and the severity of the COVID-19 pandemic in the UK. Geographic analysis showed that populated urban areas have a higher proportion of Twitter users with mental health concerns compared to England as a whole. Topic modeling showed that general concerns, COVID-19 skeptics, and Death toll were the top topics discussed in mental health-related tweets. Demographic analysis showed that middle-aged and older adults might be more likely to suffer from mental health issues or express their mental health concerns on Twitter during the COVID-19 pandemic.ConclusionsThe COVID-19 pandemic has noticeable effects on mental health concerns on Twitter in the UK, which varied among demographic and geographic groups.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Wenhao Chen ◽  
Kin Keung Lai ◽  
Yi Cai

PurposeSina Weibo and Twitter are the top microblogging platforms with billions of users. Accordingly, these two platforms could be used to understand the public mood. In this paper, the authors want to discuss how to generate and compare the public mood on Sina Weibo and Twitter. The predictive power of the public mood toward commodity markets is discussed, and the authors want to solve the problem that how to choose between Sina Weibo and Twitter when predicting crude oil prices.Design/methodology/approachAn enhanced latent Dirichlet allocation model considering term weights is implemented to generate topics from Sina Weibo and Twitter. Granger causality test and a long short-term memory neural network model are used to demonstrate that the public mood on Sina Weibo and Twitter is correlated with commodity contracts.FindingsBy comparing the topics and the public mood on Sina Weibo and Twitter, the authors find significant differences in user behavior on these two websites. Besides, the authors demonstrate that public mood on Sina Weibo and Twitter is correlated with crude oil contract prices in Shanghai International Energy Exchange and New York Mercantile Exchange, respectively.Originality/valueTwo sentiment analysis methods for Chinese (Sina Weibo) and English (Twitter) posts are introduced, which can be reused for other semantic analysis tasks. Besides, the authors present a prediction model for the practical participants in the commodity markets and introduce a method to choose between Sina Weibo and Twitter for certain prediction tasks.


2022 ◽  
Vol 54 (7) ◽  
pp. 1-35
Author(s):  
Uttam Chauhan ◽  
Apurva Shah

We are not able to deal with a mammoth text corpus without summarizing them into a relatively small subset. A computational tool is extremely needed to understand such a gigantic pool of text. Probabilistic Topic Modeling discovers and explains the enormous collection of documents by reducing them in a topical subspace. In this work, we study the background and advancement of topic modeling techniques. We first introduce the preliminaries of the topic modeling techniques and review its extensions and variations, such as topic modeling over various domains, hierarchical topic modeling, word embedded topic models, and topic models in multilingual perspectives. Besides, the research work for topic modeling in a distributed environment, topic visualization approaches also have been explored. We also covered the implementation and evaluation techniques for topic models in brief. Comparison matrices have been shown over the experimental results of the various categories of topic modeling. Diverse technical challenges and future directions have been discussed.


2020 ◽  
Vol 10 (10) ◽  
pp. 3388
Author(s):  
Sung-Hwan Kim ◽  
Hwan-Gue Cho

Analyzing user behavior in online spaces is an important task. This paper is dedicated to analyzing the online community in terms of topics. We present a user–topic model based on the latent Dirichlet allocation (LDA), as an application of topic modeling in a domain other than textual data. This model substitutes the concept of word occurrence in the original LDA method with user participation. The proposed method deals with many problems regarding topic modeling and user analysis, which include: inclusion of dynamic topics, visualization of user interaction networks, and event detection. We collected datasets from four online communities with different characteristics, and conducted experiments to demonstrate the effectiveness of our method by revealing interesting findings covering numerous aspects.


2019 ◽  
Vol 2019 ◽  
pp. 1-13
Author(s):  
Qiaoqiao Tan ◽  
Fang’ai Liu

Recommendations based on user behavior sequences are becoming more and more common. Some studies consider user behavior sequences as interests directly, ignoring the mining and representation of implicit features. However, user behaviors contain a lot of information, such as consumption habits and dynamic preferences. In order to better locate user interests, this paper proposes a Bi-GRU neural network with attention to model user’s long-term historical preferences and short-term consumption motivations. First, a Bi-GRU network is established to solve the long-term dependence problem in sequences, and attention mechanism is introduced to capture user interest changes related to the target item. Then, user’s short-term interaction trajectory based on self-attention is modeled to distinguish the importance of each potential feature. Finally, combined with long-term and short-term interests, the next behavior is predicted. We conducted extensive experiments on Amazon and MovieLens datasets. The experimental results demonstrate that the proposed model outperforms current state-of-the-art models in Recall and NDCG indicators. Especially in MovieLens dataset, compared with other RNN-based models, our proposed model improved at least 2.32% at Recall@20, which verifies the effectiveness of modeling long-term and short-term interest of users, respectively.


Author(s):  
Zhaokun Xue ◽  
Alva Couch

AbstractWe describe a recommendation system for HydroShare, a platform for scientific water data sharing. We discuss similarities, differences and challenges for implementing recommendation systems for scientific water data sharing. We discuss and analyze the behaviors that scientists exhibit in using HydroShare as documented by users’ activity logs. Unlike entertainment system users, users on HydroShare tend to be task-oriented, where the set of tasks of interest can change over time, and older interests are sometimes no longer relevant. By validating recommendation approaches against user behavior as expressed in activity logs, we conclude that a combination of content-based filtering and a latent Dirichlet allocation (LDA) topic modeling of user behavior—rather than and instead of LDA classification of dataset topics—provides a workable solution for HydroShare and compares this approach to existing recommendation methods.


Sign in / Sign up

Export Citation Format

Share Document