A Topic Modeling based Approach for Mining Online Social Media Data

Author(s):  
Nimisha S. Fal Dessai ◽  
J. A. Laxminarayanan
Author(s):  
Jiexiong Duan ◽  
Weixin Zhai ◽  
Chengqi Cheng

The Shanghai New Year’s Eve stampede on 31 December 2014, caused 36 deaths and 47 other injuries, generating attention from around the world. This research aims to explore crowd aggregation from the perspective of Sina Weibo check-in data and evaluate the potential of crowd detection based on social media data. We develop a framework using Weibo check-in data in three dimensions: the aggregation level of check-in data, the topic changes in posts and the sentiment fluctuations of citizens. The results show that the numbers of check-ins in all of Shanghai on New Years’ Eve is twice that of other days and that Moran’s I reaches a peak on this date, implying a spatial autocorrelation mode. Additionally, the results of topic modeling indicate that 72.4% of the posts were related to the stampede, reflecting public attitudes and views on this incident from multiple angles. Moreover, sentiment analysis based on Weibo posts illustrates that the proportion of negative posts increased both when the stampede occurred (40.95%) and a few hours afterwards (44.33%). This study demonstrates the potential of using geotagged social media data to analyze population spatiotemporal activities, especially in emergencies.


2018 ◽  
Author(s):  
Bernard J. Jansen ◽  
Soon-gyo Jung ◽  
Joni Salminen ◽  
Jisun An ◽  
Haewoon Kwak

2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Yeong-Hyeon Choi ◽  
Seungjoo Yoon ◽  
Bin Xuan ◽  
Sang-Yong Tom Lee ◽  
Kyu-Hye Lee

AbstractThis study used several informatics techniques to analyze consumer-driven social media data from four cities (Paris, Milan, New York, and London) during the 2019 Fall/Winter (F/W) Fashion Week. Analyzing keywords using a semantic network analysis method revealed the main characteristics of the collections, celebrities, influencers, fashion items, fashion brands, and designers connected with the four fashion weeks. Using topic modeling and a sentiment analysis, this study confirmed that brands that embodied similar themes in terms of topics and had positive sentimental reactions were also most frequently mentioned by the consumers. A semantic network analysis of the tweets showed that social media, influencers, fashion brands, designers, and words related to sustainability and ethics were mentioned in all four cities. In our topic modeling, the classification of the keywords into three topics based on the brand collection’s themes provided the most accurate model. To identify the sentimental evaluation of brands participating in the 2019 F/W Fashion Week, we analyzed the consumers’ sentiments through positive, neutral, and negative reactions. This quantitative analysis of consumer-generated social media data through this study provides insight into useful information enabling fashion brands to improve their marketing strategies.


2020 ◽  
Vol 34 (01) ◽  
pp. 346-353 ◽  
Author(s):  
Mansi Agarwal ◽  
Maitree Leekha ◽  
Ramit Sawhney ◽  
Rajiv Ratn Shah

In times of a disaster, the information available on social media can be useful for several humanitarian tasks as disseminating messages on social media is quick and easily accessible. Disaster damage assessment is inherently multi-modal, yet most existing work on damage identification has focused solely on building generic classification models that rely exclusively on text or image analysis of online social media sessions (e.g., posts). Despite their empirical success, these efforts ignore the multi-modal information manifested in social media data. Conventionally, when information from various modalities is presented together, it often exhibits complementary insights about the application domain and facilitates better learning performance. In this work, we present Crisis-DIAS, a multi-modal sequential damage identification, and severity detection system. We aim to support disaster management and aid in planning by analyzing and exploiting the impact of linguistic cues on a unimodal visual system. Through extensive qualitative, quantitative and theoretical analysis on a real-world multi-modal social media dataset, we show that the Crisis-DIAS framework is superior to the state-of-the-art damage assessment models in terms of bias, responsiveness, computational efficiency, and assessment performance.


2016 ◽  
Vol 28 (3) ◽  
pp. 268-274 ◽  
Author(s):  
Feng Yu ◽  
Theodore Peng ◽  
Kaiping Peng ◽  
Sam Xianjun Zheng ◽  
Zhiyuan Liu

10.2196/14763 ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. e14763 ◽  
Author(s):  
Jin-Young Min ◽  
Sung-Hee Song ◽  
HyeJin Kim ◽  
Kyoung-Bok Min

Background Although injured employees are legally covered by workers’ compensation insurance in South Korea, some employers make agreements to prevent the injured employees from claiming their compensation. Thus, this leads to underreporting of occupational injury statistics. Illegal compensation (called gong-sang in Korean) is a critical method used to underreport or cover-up occupational injuries. However, gong-sang is not counted in the official occupational injury statistics; therefore, we cannot identify gong-sang–related issues. Objective This study aimed to analyze social media data using topic modeling to explore hidden knowledge about illegal compensation—gong-sang—for occupational injury in South Korea. Methods We collected 2210 documents from social media data by filtering the keyword, gong-sang. The study period was between January 1, 2006, and December 31, 2017. After completing natural language processing of the Korean language, a morphological analyzer, we performed topic modeling using latent Dirichlet allocation (LDA) in the Python library, Gensim. A 10-topic model was selected and run with 3000 Gibbs sampling iterations to fit the model. Results The LDA model was used to classify gong-sang–related documents into 4 categories from a total of 10 topics. Topic 1 was the greatest concern (60.5%). Workers who suffered from industrial accidents seemed to be worried about illegal compensation and legal insurance claims, wherein keywords on the choice between illegal compensation and legal insurance claims were included. In topic 2, keywords were associated with claims for industrial accident insurance benefits. Topics 3 and 4, as the second highest concern (19%), contained keywords implying the monetary compensation of gong-sang. Topics 5 to 10 included keywords on vulnerable jobs (ie, workers in the construction and defense industry, delivery riders, and foreign workers) and body parts (ie, injuries to the hands, face, teeth, lower limbs, and back) to gong-sang. Conclusions We explored hidden knowledge to identify the salient issues surrounding gong-sang using the LDA model. These topics may provide valuable information to ensure the more efficient operation of South Korea’s occupational health and safety administration and protect vulnerable workers from illegal gong-sang compensation practices.


Sign in / Sign up

Export Citation Format

Share Document