Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study

Background The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak. Objective The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak. Methods Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics. Results A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (P<.001). This effect size was also larger than what was observed for the rest of China excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered four parent classifications including Weibo discussions about the causative agent of the disease, changing epidemiological characteristics of the outbreak, public reaction to outbreak control and response measures, and other topics. Generally, these themes also exhibited public uncertainty and changing knowledge and attitudes about COVID-19, including posts exhibiting both protective and higher-risk behaviors. Conclusions The results of this study provide initial insight into the origins of the COVID-19 outbreak based on quantitative and qualitative analysis of Chinese social media data at the initial epicenter in Wuhan City. Future studies should continue to explore the utility of social media data to predict COVID-19 disease severity, measure public reaction and behavior, and evaluate effectiveness of outbreak communication.

Download Full-text

Data Mining and Content Analysis of the Chinese Social Media Platform Weibo During the Early COVID-19 Outbreak: Retrospective Observational Infoveillance Study (Preprint)

10.2196/preprints.18700 ◽

2020 ◽

Cited By ~ 8

Author(s):

Jiawei Li ◽

Qing Xu ◽

Raphael Cuomo ◽

Vidya Purushothaman ◽

Tim Mackey

Keyword(s):

Social Media ◽

Content Analysis ◽

Quantitative Analysis ◽

Qualitative Analysis ◽

Hubei Province ◽

Wuhan City ◽

Social Media Data ◽

Public Reaction ◽

Media Data ◽

Chinese Social Media

BACKGROUND The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak. OBJECTIVE The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak. METHODS Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics. RESULTS A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (<i>P</i><.001). This effect size was also larger than what was observed for the rest of China excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered four parent classifications including Weibo discussions about the causative agent of the disease, changing epidemiological characteristics of the outbreak, public reaction to outbreak control and response measures, and other topics. Generally, these themes also exhibited public uncertainty and changing knowledge and attitudes about COVID-19, including posts exhibiting both protective and higher-risk behaviors. CONCLUSIONS The results of this study provide initial insight into the origins of the COVID-19 outbreak based on quantitative and qualitative analysis of Chinese social media data at the initial epicenter in Wuhan City. Future studies should continue to explore the utility of social media data to predict COVID-19 disease severity, measure public reaction and behavior, and evaluate effectiveness of outbreak communication.

Download Full-text

Inferring Atmospheric Particulate Matter Concentrations from Chinese Social Media Data

PLoS ONE ◽

10.1371/journal.pone.0161389 ◽

2016 ◽

Vol 11 (9) ◽

pp. e0161389 ◽

Cited By ~ 9

Author(s):

Zhu Tao ◽

Aynne Kokas ◽

Rui Zhang ◽

Daniel S. Cohan ◽

Dan Wallach

Keyword(s):

Social Media ◽

Particulate Matter ◽

Atmospheric Particulate Matter ◽

Atmospheric Particulate ◽

Social Media Data ◽

Media Data ◽

Chinese Social Media

Download Full-text

Applying sentiment analytics to examine social media crises: a case study of United Airline's crisis in 2017

Data Technologies and Applications ◽

10.1108/dta-09-2018-0087 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xin Tian ◽

Wu He ◽

Feng-Kwei Wang

Keyword(s):

Social Media ◽

Sentiment Analysis ◽

Social Media Data ◽

Content Type ◽

The Public ◽

Response Strategies ◽

Related Information ◽

Valence And Arousal ◽

Public Reaction ◽

Media Data

PurposeIn recent years, social media crises occurred more and more often, which negatively affect the reputations of individuals, businesses and communities. During each crisis, numerous users either participated in online discussion or widely spread crisis-related information to their friends and followers on social media. By applying sentiment analysis to study a social media crisis of airline carriers, the purpose of this research is to help companies take measure against social media crises.Design/methodology/approachThis study used sentiment analytics to examine a social media crisis related to airline carriers. The arousal, valence, negative, positive and eight emotional sentiments were applied to analyze social media data collected from Twitter.FindingsThis research study found that social media sentiment analysis is useful to monitor public reaction after a social media crisis arises. The sentiment results are able to reflect the development of social media crises quite well. Proper and timely response strategies to a crisis can mitigate the crisis through effective communication with the customers and the public.Originality/valueThis study used the Affective Norms of English Words (ANEW) dictionary to classify the words in social media data and assigned the words with two elements to measure the emotions: valence and arousal. The intensity of the sentiment determines the public reaction to a social media crisis. An opinion-oriented information system is proposed as a solution for resolving a social media crisis in the paper.

Download Full-text

Social Media Data-Based Sentiment Analysis of Tourists’ Air Quality Perceptions

Sustainability ◽

10.3390/su11185070 ◽

2019 ◽

Vol 11 (18) ◽

pp. 5070 ◽

Cited By ~ 3

Author(s):

Yuguo Tao ◽

Feng Zhang ◽

Chunyun Shi ◽

Yun Chen

Keyword(s):

Machine Learning ◽

Social Media ◽

Content Analysis ◽

Air Quality ◽

Sentiment Analysis ◽

Social Media Data ◽

Sina Weibo ◽

Emotion Words ◽

Tourist Destinations ◽

Media Data

Analyzing tourists’ perceptions of air quality is of great significance to the study of tourist experience satisfaction and the image construction of tourism destinations. In this study, using the web crawler technique, we collected 27,500 comments regarding the air quality of 195 of China’s Class 5A tourist destinations posted by tourists on Sina Weibo from January 2011 to December 2017; these comments were then subjected to a content analysis using the Gooseeker, ROST CM (Content Mining System) and BosonNLP (Natural Language Processing) tools. Based on an analysis of the proportions of sentences with different emotional polarities with ROST EA (Emotion Analysis), we measured the sentiment value of texts using the artificial neural network (ANN) machine learning method implemented through a Chinese social media data-oriented Boson platform based on the Python programming language. The content analysis results indicated that in the adaption stage in Sina Weibo, tourists’ perceptions of air quality were mainly positive and had poor air pollution crisis awareness. Objective emotion words exhibited a similarly high proportion as subjective emotion words, indicating that taking both objective and subjective emotion words into account simultaneously helps to comprehensively understand the emotional content of the comments. The sentiment analysis results showed that for the entire text, sentences with positive emotions accounted for 85.53% of the total comments, with a sentiment value of 0.786, which belonged to the positive medium level; the direction of the temporal “up-down-up” changes and the spatial pattern of high in the south and low in the north (while having little difference between the east and the west) were basically consistent with reality. A further exploration of the theoretical basis of the semi-supervised ANN approach or the introduction of other machine learning methods using different data sources will help to analyze this phenomenon in greater depth. The paper provides evidence for new data and methods for air quality research in tourist destinations and provides a new tool for air quality monitoring.

Download Full-text

Analyzing social media data: A mixed-methods framework combining computational and qualitative text analysis

10.31234/osf.io/bynz4 ◽

2019 ◽

Author(s):

Matthew Andreotta ◽

Robertus Nugroho ◽

Mark Hurlstone ◽

Fabio Boschetti ◽

Simon Farrell ◽

...

Keyword(s):

Social Media ◽

Qualitative Analysis ◽

Data Science ◽

Large Data ◽

Extraction Process ◽

Data Set ◽

Diverse Range ◽

Social Media Data ◽

Qualitative Thematic Analysis ◽

Media Data

To qualitative researchers, social media offers a novel opportunity to harvest a massive and diverse range of content, without the need for intrusive or intensive data collection procedures. However, performing a qualitative analysis across a massive social media data set is cumbersome and impractical. Instead, researchers often extract a subset of content to analyze, but a framework to facilitate this process is currently lacking. We present a four-phased framework for improving this extraction process, which blends the capacities of data science techniques to compress large data sets into smaller spaces, with the capabilities of qualitative analysis to address research questions. We demonstrate this framework by investigating the topics of Australian Twitter commentary on climate change, using quantitative (Non-Negative Matrix inter-joint Factorization; Topic Alignment) and qualitative (Thematic Analysis) techniques. Our approach is useful for researchers seeking to perform qualitative analyses of social media, or researchers wanting to supplement their quantitative work with a qualitative analysis of broader social context and meaning.

Download Full-text

Qualitative Analysis of Social Media Data

SAGE Research Methods Foundations ◽

10.4135/9781526421036840280 ◽

2020 ◽

Cited By ~ 2

Keyword(s):

Social Media ◽

Qualitative Analysis ◽

Social Media Data ◽

Media Data

Download Full-text

Exploring the usage of social media in extant campus sustainability assessment frameworks for sustainable campus development

International Journal of Sustainability in Higher Education ◽

10.1108/ijshe-03-2021-0091 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Yusuf A. Adenle ◽

Mohammed Abdul-Rahman ◽

Oluwole A. Soyinka

Keyword(s):

Social Media ◽

Content Analysis ◽

Environmental Sustainability ◽

Sustainability Indicators ◽

Fair Use ◽

Social Media Data ◽

Content Type ◽

Tertiary Institutions ◽

Campus Sustainability ◽

Media Data

Purpose As one of the buzzwords in the present age with considerable impacts in tertiary institutions, social media use in online teaching, learning and information dissemination have been extensively discussed in extant literature. This paper aims to explore the existing campus sustainability appraisal (CSA) tools to identify the length at which social media has been used, especially in environmental sustainability indicators’ selection and empirical verification. Design/methodology/approach The methodology is mainly based on a desktop study involving comprehensive review and content analysis of existing CSA tools’ documents. Webpage content analysis of selected sustainability monitoring and tracking system in higher education institutions was also conducted. Findings The tools' content analysis reveals insufficient utilization of social media data and platforms in campus sustainability environmental-dimension indicators selection. To bridge this identified research gap, social media user-generated content for appraising the campus-wide environmental sustainability indicators preference in tertiary institutions was proposed. Practical implications The adoption and modification of this study’s proposed approach by tertiary institutions, especially in sub-Saharan African countries, could help address most campus-wide environmental challenges raised, commented on and discussed on social media. Originality/value This study contributes to knowledge gaps by revealing the extent of social media utilization in extant tools. With the expanding utilization of different social media platforms by various tertiary institutions worldwide, their administrators' responsibility is to put these social media data into fair use.

Download Full-text

Risks associated with antiretroviral treatment for human immunodeficiency virus (HIV): qualitative analysis of social media data and health state utility valuation

Quality of Life Research ◽

10.1007/s11136-017-1519-3 ◽

2017 ◽

Vol 26 (7) ◽

pp. 1785-1798

Author(s):

Louis S. Matza ◽

Karen C. Chung ◽

Katherine J. Kim ◽

Trena M. Paulus ◽

Evan W. Davies ◽

...

Keyword(s):

Social Media ◽

Human Immunodeficiency Virus ◽

Qualitative Analysis ◽

Antiretroviral Treatment ◽

Health State Utility ◽

Health State ◽

Social Media Data ◽

Immunodeficiency Virus ◽

Media Data

Download Full-text

Social Media Data Relevant for Measuring Key Performance Indicators? A Content Analysis Approach

Lecture Notes in Business Information Processing - Co-created Effective, Agile, and Trusted eServices ◽

10.1007/978-3-642-39808-7_7 ◽

2013 ◽

pp. 74-84 ◽

Cited By ~ 3

Author(s):

Joeri Heijnen ◽

Mark de Reuver ◽

Harry Bouwman ◽

Martijn Warnier ◽

Han Horlings

Keyword(s):

Social Media ◽

Content Analysis ◽

Performance Indicators ◽

Key Performance Indicators ◽

Analysis Approach ◽

Social Media Data ◽

Media Data

Download Full-text

Using Reports of Symptoms and Diagnoses on Social Media to Predict COVID-19 Case Counts in Mainland China: Observational Infoveillance Study (Preprint)

10.2196/preprints.19421 ◽

2020 ◽

Cited By ~ 9

Author(s):

Cuihua Shen ◽

Anfan Chen ◽

Chen Luo ◽

Jingwen Zhang ◽

Bo Feng ◽

...

Keyword(s):

Machine Learning ◽

Social Media ◽

Mainland China ◽

Hubei Province ◽

Learning Approaches ◽

Theoretical Understanding ◽

Unequal Distribution ◽

Social Media Data ◽

Screening And Surveillance ◽

Media Data

BACKGROUND Coronavirus disease (COVID-19) has affected more than 200 countries and territories worldwide. This disease poses an extraordinary challenge for public health systems because screening and surveillance capacity is often severely limited, especially during the beginning of the outbreak; this can fuel the outbreak, as many patients can unknowingly infect other people. OBJECTIVE The aim of this study was to collect and analyze posts related to COVID-19 on Weibo, a popular Twitter-like social media site in China. To our knowledge, this infoveillance study employs the largest, most comprehensive, and most fine-grained social media data to date to predict COVID-19 case counts in mainland China. METHODS We built a Weibo user pool of 250 million people, approximately half the entire monthly active Weibo user population. Using a comprehensive list of 167 keywords, we retrieved and analyzed around 15 million COVID-19–related posts from our user pool from November 1, 2019 to March 31, 2020. We developed a machine learning classifier to identify “sick posts,” in which users report their own or other people’s symptoms and diagnoses related to COVID-19. Using officially reported case counts as the outcome, we then estimated the Granger causality of sick posts and other COVID-19 posts on daily case counts. For a subset of geotagged posts (3.10% of all retrieved posts), we also ran separate predictive models for Hubei province, the epicenter of the initial outbreak, and the rest of mainland China. RESULTS We found that reports of symptoms and diagnosis of COVID-19 significantly predicted daily case counts up to 14 days ahead of official statistics, whereas other COVID-19 posts did not have similar predictive power. For the subset of geotagged posts, we found that the predictive pattern held true for both Hubei province and the rest of mainland China regardless of the unequal distribution of health care resources and the outbreak timeline. CONCLUSIONS Public social media data can be usefully harnessed to predict infection cases and inform timely responses. Researchers and disease control agencies should pay close attention to the social media infosphere regarding COVID-19. In addition to monitoring overall search and posting activities, leveraging machine learning approaches and theoretical understanding of information sharing behaviors is a promising approach to identify true disease signals and improve the effectiveness of infoveillance.

Download Full-text