Crash tags: Topic modeling social media data after fatal automated ve-hicle crashes

The Shanghai New Year’s Eve stampede on 31 December 2014, caused 36 deaths and 47 other injuries, generating attention from around the world. This research aims to explore crowd aggregation from the perspective of Sina Weibo check-in data and evaluate the potential of crowd detection based on social media data. We develop a framework using Weibo check-in data in three dimensions: the aggregation level of check-in data, the topic changes in posts and the sentiment fluctuations of citizens. The results show that the numbers of check-ins in all of Shanghai on New Years’ Eve is twice that of other days and that Moran’s I reaches a peak on this date, implying a spatial autocorrelation mode. Additionally, the results of topic modeling indicate that 72.4% of the posts were related to the stampede, reflecting public attitudes and views on this incident from multiple angles. Moreover, sentiment analysis based on Weibo posts illustrates that the proportion of negative posts increased both when the stampede occurred (40.95%) and a few hours afterwards (44.33%). This study demonstrates the potential of using geotagged social media data to analyze population spatiotemporal activities, especially in emergencies.

Download Full-text

Detecting information requirements for crisis communication from social media data: An interactive topic modeling approach

International Journal of Disaster Risk Reduction ◽

10.1016/j.ijdrr.2020.101692 ◽

2020 ◽

Vol 50 ◽

pp. 101692

Author(s):

Qing Deng ◽

Yang Gao ◽

Chenyang Wang ◽

Hui Zhang

Keyword(s):

Social Media ◽

Crisis Communication ◽

Topic Modeling ◽

Information Requirements ◽

Social Media Data ◽

Modeling Approach ◽

Media Data

Download Full-text

textPrep: A Text Preprocessing Toolkit for Topic Modeling on Social Media Data

10.5220/0010559000002993 ◽

2021 ◽

Author(s):

Rob Churchill ◽

Lisa Singh

Keyword(s):

Social Media ◽

Topic Modeling ◽

Social Media Data ◽

Text Preprocessing ◽

Media Data

Download Full-text

Fashion informatics of the Big 4 Fashion Weeks using topic modeling and sentiment analysis

Fashion and Textiles ◽

10.1186/s40691-021-00265-6 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Yeong-Hyeon Choi ◽

Seungjoo Yoon ◽

Bin Xuan ◽

Sang-Yong Tom Lee ◽

Kyu-Hye Lee

Keyword(s):

New York ◽

Social Media ◽

Network Analysis ◽

Sentiment Analysis ◽

Topic Modeling ◽

Semantic Network ◽

Semantic Network Analysis ◽

Social Media Data ◽

Big 4 ◽

Media Data

AbstractThis study used several informatics techniques to analyze consumer-driven social media data from four cities (Paris, Milan, New York, and London) during the 2019 Fall/Winter (F/W) Fashion Week. Analyzing keywords using a semantic network analysis method revealed the main characteristics of the collections, celebrities, influencers, fashion items, fashion brands, and designers connected with the four fashion weeks. Using topic modeling and a sentiment analysis, this study confirmed that brands that embodied similar themes in terms of topics and had positive sentimental reactions were also most frequently mentioned by the consumers. A semantic network analysis of the tweets showed that social media, influencers, fashion brands, designers, and words related to sustainability and ethics were mentioned in all four cities. In our topic modeling, the classification of the keywords into three topics based on the brand collection’s themes provided the most accurate model. To identify the sentimental evaluation of brands participating in the 2019 F/W Fashion Week, we analyzed the consumers’ sentiments through positive, neutral, and negative reactions. This quantitative analysis of consumer-generated social media data through this study provides insight into useful information enabling fashion brands to improve their marketing strategies.

Download Full-text

Mining Hidden Knowledge About Illegal Compensation for Occupational Injury: Topic Model Approach

JMIR Medical Informatics ◽

10.2196/14763 ◽

2019 ◽

Vol 7 (3) ◽

pp. e14763 ◽

Cited By ~ 2

Author(s):

Jin-Young Min ◽

Sung-Hee Song ◽

HyeJin Kim ◽

Kyoung-Bok Min

Keyword(s):

Social Media ◽

South Korea ◽

Topic Modeling ◽

Occupational Injury ◽

Topic Model ◽

Insurance Claims ◽

Efficient Operation ◽

Social Media Data ◽

Hidden Knowledge ◽

Media Data

Background Although injured employees are legally covered by workers’ compensation insurance in South Korea, some employers make agreements to prevent the injured employees from claiming their compensation. Thus, this leads to underreporting of occupational injury statistics. Illegal compensation (called gong-sang in Korean) is a critical method used to underreport or cover-up occupational injuries. However, gong-sang is not counted in the official occupational injury statistics; therefore, we cannot identify gong-sang–related issues. Objective This study aimed to analyze social media data using topic modeling to explore hidden knowledge about illegal compensation—gong-sang—for occupational injury in South Korea. Methods We collected 2210 documents from social media data by filtering the keyword, gong-sang. The study period was between January 1, 2006, and December 31, 2017. After completing natural language processing of the Korean language, a morphological analyzer, we performed topic modeling using latent Dirichlet allocation (LDA) in the Python library, Gensim. A 10-topic model was selected and run with 3000 Gibbs sampling iterations to fit the model. Results The LDA model was used to classify gong-sang–related documents into 4 categories from a total of 10 topics. Topic 1 was the greatest concern (60.5%). Workers who suffered from industrial accidents seemed to be worried about illegal compensation and legal insurance claims, wherein keywords on the choice between illegal compensation and legal insurance claims were included. In topic 2, keywords were associated with claims for industrial accident insurance benefits. Topics 3 and 4, as the second highest concern (19%), contained keywords implying the monetary compensation of gong-sang. Topics 5 to 10 included keywords on vulnerable jobs (ie, workers in the construction and defense industry, delivery riders, and foreign workers) and body parts (ie, injuries to the hands, face, teeth, lower limbs, and back) to gong-sang. Conclusions We explored hidden knowledge to identify the salient issues surrounding gong-sang using the LDA model. These topics may provide valuable information to ensure the more efficient operation of South Korea’s occupational health and safety administration and protect vulnerable workers from illegal gong-sang compensation practices.

Download Full-text

textPrep: A Text Preprocessing Toolkit for Topic Modeling on Social Media Data

Proceedings of the 10th International Conference on Data Science, Technology and Applications ◽

10.5220/0010559000600070 ◽

2021 ◽

Author(s):

Rob Churchill ◽

Lisa Singh

Keyword(s):

Social Media ◽

Topic Modeling ◽

Social Media Data ◽

Text Preprocessing ◽

Media Data

Download Full-text

A Social Media Study on the Associations of Flavored Electronic Cigarettes With Health Symptoms: Observational Study (Preprint)

10.2196/preprints.17496 ◽

2019 ◽

Author(s):

Long Chen ◽

Xinyi Lu ◽

Jianbo Yuan ◽

Joyce Luo ◽

Jiebo Luo ◽

...

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Topic Modeling ◽

Electronic Cigarettes ◽

Temporal Analysis ◽

Estimating Equation ◽

Social Media Data ◽

Health Symptoms ◽

Reported Health ◽

Media Data

BACKGROUND In recent years, flavored electronic cigarettes (e-cigarettes) have become popular among teenagers and young adults. Discussions about e-cigarettes and e-cigarette use (vaping) experiences are prevalent online, making social media an ideal resource for understanding the health risks associated with e-cigarette flavors from the users’ perspective. OBJECTIVE This study aimed to investigate the potential associations between electronic cigarette liquid (e-liquid) flavors and the reporting of health symptoms using social media data. METHODS A dataset consisting of 2.8 million e-cigarette–related posts was collected using keyword filtering from Reddit, a social media platform, from January 2013 to April 2019. Temporal analysis for nine major health symptom categories was used to understand the trend of public concerns related to e-cigarettes. Sentiment analysis was conducted to obtain the proportions of positive and negative sentiment scores for all reported health symptom categories. Topic modeling was applied to reveal the topics related to e-cigarettes and health symptoms. Furthermore, generalized estimating equation (GEE) models were used to quantitatively measure potential associations between e-liquid flavors and the reporting of health symptoms. RESULTS Temporal analysis showed that the Respiratory category was consistently the most discussed health symptom category among all categories related to e-cigarettes on Reddit, followed by the Throat category. Sentiment analysis showed higher proportions of positive sentiment scores for all reported health symptom categories, except for the Cancer category. Topic modeling conducted on all health-related posts showed that 17 of the top 100 topics were flavor related. GEE models showed different associations between the reporting of health symptoms and e-liquid flavor categories, for example, lower association of the Beverage flavors with Respiratory compared with other flavors and higher association of the Fruit flavors with Cardiovascular than other flavors. CONCLUSIONS This study identified different potential associations between e-liquid flavors and the reporting of health symptoms using social media data. The results of this study provide valuable information for further investigation of the health effects associated with different e-liquid flavors.

Download Full-text

Mining Hidden Knowledge About Illegal Compensation for Occupational Injury: Topic Model Approach (Preprint)

10.2196/preprints.14763 ◽

2019 ◽

Author(s):

Jin-Young Min ◽

Sung-Hee Song ◽

HyeJin Kim ◽

Kyoung-Bok Min

Keyword(s):

Social Media ◽

South Korea ◽

Topic Modeling ◽

Occupational Injury ◽

Topic Model ◽

Insurance Claims ◽

Efficient Operation ◽

Social Media Data ◽

Hidden Knowledge ◽

Media Data

BACKGROUND Although injured employees are legally covered by workers’ compensation insurance in South Korea, some employers make agreements to prevent the injured employees from claiming their compensation. Thus, this leads to underreporting of occupational injury statistics. Illegal compensation (called <italic>gong-sang</italic> in Korean) is a critical method used to underreport or cover-up occupational injuries. However, <italic>gong-sang</italic> is not counted in the official occupational injury statistics; therefore, we cannot identify <italic>gong-sang</italic>–related issues. OBJECTIVE This study aimed to analyze social media data using topic modeling to explore hidden knowledge about illegal compensation—<italic>gong-sang</italic>—for occupational injury in South Korea. METHODS We collected 2210 documents from social media data by filtering the keyword, <italic>gong-sang</italic>. The study period was between January 1, 2006, and December 31, 2017. After completing natural language processing of the Korean language, a morphological analyzer, we performed topic modeling using latent Dirichlet allocation (LDA) in the Python library, Gensim. A 10-topic model was selected and run with 3000 Gibbs sampling iterations to fit the model. RESULTS The LDA model was used to classify <italic>gong-sang</italic>–related documents into 4 categories from a total of 10 topics. Topic 1 was the greatest concern (60.5%). Workers who suffered from industrial accidents seemed to be worried about illegal compensation and legal insurance claims, wherein keywords on the choice between illegal compensation and legal insurance claims were included. In topic 2, keywords were associated with claims for industrial accident insurance benefits. Topics 3 and 4, as the second highest concern (19%), contained keywords implying the monetary compensation of <italic>gong-sang</italic>. Topics 5 to 10 included keywords on vulnerable jobs (ie, workers in the construction and defense industry, delivery riders, and foreign workers) and body parts (ie, injuries to the hands, face, teeth, lower limbs, and back) to <italic>gong-sang</italic>. CONCLUSIONS We explored hidden knowledge to identify the salient issues surrounding <italic>gong-sang</italic> using the LDA model. These topics may provide valuable information to ensure the more efficient operation of South Korea’s occupational health and safety administration and protect vulnerable workers from illegal <italic>gong-sang</italic> compensation practices.

Download Full-text

Topic modeling to mind illegal compensation for occupational injuries

European Journal of Public Health ◽

10.1093/eurpub/ckz186.317 ◽

2019 ◽

Vol 29 (Supplement_4) ◽

Author(s):

S H Song ◽

J Y Min ◽

H J Kim ◽

K B Min

Keyword(s):

Social Media ◽

Topic Modeling ◽

Social Insurance ◽

Latent Dirichlet Allocation ◽

Occupational Injuries ◽

Workplace Safety ◽

Body Parts ◽

Insurance Claims ◽

Social Media Data ◽

Media Data

Abstract Background Accurate reports of occupational injuries are important to monitor workplace safety and health initiatives. In South Korea, media reports, experts, and workers have been constantly raising the issue of underreporting. Supposedly it is because employers have strong market “incentives” by underreporting their employees’ injuries. A critical way to underreport or cover-up is illegal compensation (in Korean called “gong-sang”). Unfortunately, “gong-sang” is not counted as official occupational injury statistics. The aim of this study was to analyze the social media data using topic modeling and to explore issues surrounding “gong-sang”. Methods We used web scraping technology and collected 2,210 social media data from Web search engines. Data was processed to transform unstructured textual documents into structured data using the Python and applied Latent Dirichlet allocation (LDA) in the Python library, Gensim, for topic modeling. Results Based on the LDA method from “gong-sang”- related documentation, 10 topics were identified. Topic 1 was the greatest concern (60.5%), with keywords implying the choice between illegal compensation (“gong-sang”) and legal insurance claims. The next concern was Topic 2 including keywords associated with claims for industrial accident insurance benefits. The rest topics (topic 3-10) showed the monetary issue, precarious employment, and vulnerable body parts to “gong-sang”. Conclusions We explored web-based data and identified the salient issues surrounding “gong-sang”. LDA topics may be helpful to ensure efficient occupational health and safety scheme to protect vulnerable employees from “gong-sang” practices. Key messages The topics formulated by LDA included queries about legal insurance claims. Legal insurance claims including private or social insurance, monetary compensation, injured body parts, and the type of jobs vulnerable to “gong-sang”.

Download Full-text