A reliable cross-site user generated content modeling method based on topic model

Purpose The development of social media has led to large numbers of internet users now producing massive amounts of user-generated content (UGC). UGC, which shows users’ opinions about events directly, is valuable for monitoring public opinion. Current researches have focused on analysing topic evolutions in UGC. However, few researches pay attention to emotion evolutions of sub-topics about popular events. Important details about users’ opinions might be missed, as users’ emotions are ignored. This paper aims to extract sub-topics about a popular event from UGC and investigate the emotion evolutions of each sub-topic. Design/methodology/approach This paper first collects UGC about a popular event as experimental data and conducts subjectivity classification on the data to get subjective corpus. Second, the subjective corpus is classified into different emotion categories using supervised emotion classification. Meanwhile, a topic model is used to extract sub-topics about the event from the subjective corpora. Finally, the authors use the results of emotion classification and sub-topic extraction to analyze emotion evolutions over time. Findings Experimental results show that specific primary emotions exist in each sub-topic and undergo evolutions differently. Moreover, the authors find that performance of emotion classifier is optimal with term frequency and relevance frequency as the feature-weighting method. Originality/value To the best of the authors’ knowledge, this is the first research to mine emotion evolutions of sub-topics about an event with UGC. It mines users’ opinions about sub-topics of event, which may offer more details that are useful for analysing users’ emotions in preparation for decision-making.

Download Full-text

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach (Preprint)

10.2196/preprints.18273 ◽

2020 ◽

Author(s):

Sicheng Zhou ◽

Yunpeng Zhao ◽

Jiang Bian ◽

Ann F Haynos ◽

Rui Zhang

Keyword(s):

Machine Learning ◽

Topic Modeling ◽

Short Term Memory ◽

Topic Model ◽

Modeling Method ◽

Mental Illnesses ◽

Computational Method ◽

Supervised Machine Learning ◽

Support Vector ◽

Domain Expert

BACKGROUND Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. OBJECTIVE This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. METHODS We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. RESULTS A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F<sub>1</sub> score=0.89) and then promotional versus published by laypeople (F<sub>1</sub> score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. CONCLUSIONS A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders.

Download Full-text

The Effects of Green Restaurant Attributes on Customer Satisfaction Using the Structural Topic Model on Online Customer Reviews

Sustainability ◽

10.3390/su12072843 ◽

2020 ◽

Vol 12 (7) ◽

pp. 2843 ◽

Cited By ~ 3

Author(s):

Eunhye (Olivia) Park ◽

Bongsug (Kevin) Chae ◽

Junehee Kwon ◽

Woo-Hyuk Kim

Keyword(s):

Customer Satisfaction ◽

Topic Model ◽

Marketing Strategies ◽

Restaurant Industry ◽

User Generated Content ◽

Customer Reviews ◽

Online Customer Reviews ◽

Longitudinal Approach ◽

The Common ◽

Structural Topic Modeling

Although green practice is increasingly adopted in the restaurant industry, there is still little research in terms of investigating the impacts of green practice on customer satisfaction. This study utilized user-generated content by green restaurant customers to identify various aspects of green restaurants, including perceived green restaurant practices. Our data are based on U.S. green-certified restaurants available on Yelp. Structural topic modeling was used to discover latent restaurant attributes from user-generated content. With a longitudinal approach, the changes in customers’ interest in green practices were estimated. Finally, the common restaurant attributes and green attributes were used to predict customer satisfaction. This study will contribute to marketing strategies for the restaurant industry.

Download Full-text

Jointly Predicting Future Content in Multiple Social Media Sites Based on Multi-task Learning

ACM Transactions on Information Systems ◽

10.1145/3495530 ◽

2022 ◽

Vol 40 (4) ◽

pp. 1-28

Author(s):

Peng Zhang ◽

Baoxi Liu ◽

Tun Lu ◽

Xianghua Ding ◽

Hansu Gu ◽

...

Keyword(s):

Social Media ◽

User Behavior ◽

Learning Method ◽

Data Sampling ◽

Behavior Prediction ◽

Fine Grained ◽

Task Learning ◽

Social Media Site ◽

Site User ◽

Cross Site

User-generated contents (UGC) in social media are the direct expression of users’ interests, preferences, and opinions. User behavior prediction based on UGC has increasingly been investigated in recent years. Compared to learning a person’s behavioral patterns in each social media site separately, jointly predicting user behavior in multiple social media sites and complementing each other (cross-site user behavior prediction) can be more accurate. However, cross-site user behavior prediction based on UGC is a challenging task due to the difficulty of cross-site data sampling, the complexity of UGC modeling, and uncertainty of knowledge sharing among different sites. For these problems, we propose a Cross-Site Multi-Task (CSMT) learning method to jointly predict user behavior in multiple social media sites. CSMT mainly derives from the hierarchical attention network and multi-task learning. Using this method, the UGC in each social media site can obtain fine-grained representations in terms of words, topics, posts, hashtags, and time slices as well as the relevances among them, and prediction tasks in different social media sites can be jointly implemented and complement each other. By utilizing two cross-site datasets sampled from Weibo, Douban, Facebook, and Twitter, we validate our method’s superiority on several classification metrics compared with existing related methods.

Download Full-text

An Adaptive Cross-Site User Modelling Platform for Cultural Heritage Websites

Communications in Computer and Information Science - Digital Libraries and Archives ◽

10.1007/978-3-319-68130-6_11 ◽

2017 ◽

pp. 132-141 ◽

Cited By ~ 1

Author(s):

Maristella Agosti ◽

Séamus Lawless ◽

Stefano Marchesin ◽

Vincent Wade

Keyword(s):

Cultural Heritage ◽

User Modelling ◽

Site User ◽

Cross Site

Download Full-text

Modeling method of internet public information data mining based on probabilistic topic model

The Journal of Supercomputing ◽

10.1007/s11227-019-02885-8 ◽

2019 ◽

Vol 75 (9) ◽

pp. 5882-5897 ◽

Cited By ~ 10

Author(s):

Shaofei Wu ◽

Jun Liu ◽

Lizhi Liu

Keyword(s):

Data Mining ◽

Topic Model ◽

Public Information ◽

Modeling Method ◽

Probabilistic Topic Model

Download Full-text

Exploring Eating Disorder Topics on Twitter: Machine Learning Approach

JMIR Medical Informatics ◽

10.2196/18273 ◽

2020 ◽

Vol 8 (10) ◽

pp. e18273

Author(s):

Sicheng Zhou ◽

Yunpeng Zhao ◽

Jiang Bian ◽

Ann F Haynos ◽

Rui Zhang

Keyword(s):

Machine Learning ◽

Topic Modeling ◽

Short Term Memory ◽

Topic Model ◽

Modeling Method ◽

Mental Illnesses ◽

Computational Method ◽

Supervised Machine Learning ◽

Support Vector ◽

Domain Expert

Background Eating disorders (EDs) are a group of mental illnesses that have an adverse effect on both mental and physical health. As social media platforms (eg, Twitter) have become an important data source for public health research, some studies have qualitatively explored the ways in which EDs are discussed on these platforms. Initial results suggest that such research offers a promising method for further understanding this group of diseases. Nevertheless, an efficient computational method is needed to further identify and analyze tweets relevant to EDs on a larger scale. Objective This study aims to develop and validate a machine learning–based classifier to identify tweets related to EDs and to explore factors (ie, topics) related to EDs using a topic modeling method. Methods We collected potential ED-relevant tweets using keywords from previous studies and annotated these tweets into different groups (ie, ED relevant vs irrelevant and then promotional information vs laypeople discussion). Several supervised machine learning methods, such as convolutional neural network (CNN), long short-term memory (LSTM), support vector machine, and naïve Bayes, were developed and evaluated using annotated data. We used the classifier with the best performance to identify ED-relevant tweets and applied a topic modeling method—Correlation Explanation (CorEx)—to analyze the content of the identified tweets. To validate these machine learning results, we also collected a cohort of ED-relevant tweets on the basis of manually curated rules. Results A total of 123,977 tweets were collected during the set period. We randomly annotated 2219 tweets for developing the machine learning classifiers. We developed a CNN-LSTM classifier to identify ED-relevant tweets published by laypeople in 2 steps: first relevant versus irrelevant (F1 score=0.89) and then promotional versus published by laypeople (F1 score=0.90). A total of 40,790 ED-relevant tweets were identified using the CNN-LSTM classifier. We also identified another set of tweets (ie, 17,632 ED-relevant and 83,557 ED-irrelevant tweets) posted by laypeople using manually specified rules. Using CorEx on all ED-relevant tweets, the topic model identified 162 topics. Overall, the coherence rate for topic modeling was 77.07% (1264/1640), indicating a high quality of the produced topics. The topics were further reviewed and analyzed by a domain expert. Conclusions A developed CNN-LSTM classifier could improve the efficiency of identifying ED-relevant tweets compared with the traditional manual-based method. The CorEx topic model was applied on the tweets identified by the machine learning–based classifier and the traditional manual approach separately. Highly overlapping topics were observed between the 2 cohorts of tweets. The produced topics were further reviewed by a domain expert. Some of the topics identified by the potential ED tweets may provide new avenues for understanding this serious set of disorders.

Download Full-text

Coupled Topic Model for Collaborative Filtering With User-Generated Content

IEEE Transactions on Human-Machine Systems ◽

10.1109/thms.2016.2586480 ◽

2016 ◽

Vol 46 (6) ◽

pp. 908-920 ◽

Cited By ~ 7

Author(s):

Shu Wu ◽

Weiyu Guo ◽

Song Xu ◽

Yongzhen Huang ◽

Liang Wang ◽

...

Keyword(s):

Collaborative Filtering ◽

Topic Model ◽

User Generated Content

Download Full-text

What Are the Salient and Memorable Green-Restaurant Attributes? Capturing Customer Perceptions From User-Generated Content

SAGE Open ◽

10.1177/21582440211031546 ◽

2021 ◽

Vol 11 (3) ◽

pp. 215824402110315

Author(s):

Eunhye Park ◽

Junehee Kwon ◽

Bongsug (Kevin) Chae ◽

Sung-Bum Kim

Keyword(s):

Qualitative Data ◽

Topic Model ◽

Online Reviews ◽

Marketing Strategies ◽

User Generated Content ◽

Customer Perceptions ◽

Modeling Methodology ◽

Probabilistic Topic Model ◽

Practical Implications ◽

Structural Topic Model

This study aims to survey user-generated content (UGC) from diners in certified green restaurants, discover the green images they recall, and demonstrate the usefulness of applying a probabilistic topic model to comprehend customers’ perceptions. Postvisit online reviews ( N = 28,098), in the form of unstructured texts from the TripAdvisor.com website, were used to find freely recalled green-restaurant images. These data were preprocessed with a structural topic model (STM) algorithm to select 51 relevant categories of images. These image categories were compared with the findings of previous studies to discover unique restaurant attributes. Furthermore, a topic-level network and a green-restaurant network were drawn to discover the most easily recallable image categories and their attributes. This machine-learning-based approach improved the reproducibility of unstructured data analyses, overcoming the subjectivity of qualitative data analysis. Theoretical and practical implications are offered for topic modeling methodology along with marketing strategies for restaurateurs.

Download Full-text

HPAKE : Password Authentication Secure against Cross-Site User Impersonation

Cryptology and Network Security - Lecture Notes in Computer Science ◽

10.1007/978-3-642-10433-6_19 ◽

2009 ◽

pp. 279-298 ◽

Cited By ~ 3

Author(s):

Xavier Boyen

Keyword(s):

Password Authentication ◽

Site User ◽

Cross Site

Download Full-text