Social Media Data Extraction and Content Analysis

2017 ◽  
10.2196/18700 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e18700 ◽  
Author(s):  
Jiawei Li ◽  
Qing Xu ◽  
Raphael Cuomo ◽  
Vidya Purushothaman ◽  
Tim Mackey

Background The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak. Objective The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak. Methods Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics. Results A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (P<.001). This effect size was also larger than what was observed for the rest of China excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered four parent classifications including Weibo discussions about the causative agent of the disease, changing epidemiological characteristics of the outbreak, public reaction to outbreak control and response measures, and other topics. Generally, these themes also exhibited public uncertainty and changing knowledge and attitudes about COVID-19, including posts exhibiting both protective and higher-risk behaviors. Conclusions The results of this study provide initial insight into the origins of the COVID-19 outbreak based on quantitative and qualitative analysis of Chinese social media data at the initial epicenter in Wuhan City. Future studies should continue to explore the utility of social media data to predict COVID-19 disease severity, measure public reaction and behavior, and evaluate effectiveness of outbreak communication.


2019 ◽  
Vol 11 (18) ◽  
pp. 5070 ◽  
Author(s):  
Yuguo Tao ◽  
Feng Zhang ◽  
Chunyun Shi ◽  
Yun Chen

Analyzing tourists’ perceptions of air quality is of great significance to the study of tourist experience satisfaction and the image construction of tourism destinations. In this study, using the web crawler technique, we collected 27,500 comments regarding the air quality of 195 of China’s Class 5A tourist destinations posted by tourists on Sina Weibo from January 2011 to December 2017; these comments were then subjected to a content analysis using the Gooseeker, ROST CM (Content Mining System) and BosonNLP (Natural Language Processing) tools. Based on an analysis of the proportions of sentences with different emotional polarities with ROST EA (Emotion Analysis), we measured the sentiment value of texts using the artificial neural network (ANN) machine learning method implemented through a Chinese social media data-oriented Boson platform based on the Python programming language. The content analysis results indicated that in the adaption stage in Sina Weibo, tourists’ perceptions of air quality were mainly positive and had poor air pollution crisis awareness. Objective emotion words exhibited a similarly high proportion as subjective emotion words, indicating that taking both objective and subjective emotion words into account simultaneously helps to comprehensively understand the emotional content of the comments. The sentiment analysis results showed that for the entire text, sentences with positive emotions accounted for 85.53% of the total comments, with a sentiment value of 0.786, which belonged to the positive medium level; the direction of the temporal “up-down-up” changes and the spatial pattern of high in the south and low in the north (while having little difference between the east and the west) were basically consistent with reality. A further exploration of the theoretical basis of the semi-supervised ANN approach or the introduction of other machine learning methods using different data sources will help to analyze this phenomenon in greater depth. The paper provides evidence for new data and methods for air quality research in tourist destinations and provides a new tool for air quality monitoring.


2015 ◽  
Author(s):  
Evika Karamagioli

Background: As the use of social media creates huge amounts of data, the need for big data analysis has to synthesize the information and determine which actions is generated. Online communication channels such as Facebook, Twitter, Instagram etc provide a wealth of passively collected data that may be mined for public health purposes such as health surveillance, health crisis management, and last but not least health promotion and education. Objective: We explore international bibliography on the potential role and perceptive of use for social media as a big data source for public health purposes. Method: Systematic literature review. Data extraction and synthesis was performed with the use of thematic analysis. Results: Examples of those currently collecting and analyzing big data from generated social content include scientists who are working with the Centers for Disease Control and Prevention to track the spread of flu by analyzing what user searches, and the World Health Organization is working on disaster management relief. But what exactly do we do with this big social media data? We can track real-time trends and understand them quicker through the platforms and processing services. By processing this big social media data, it is possible to determine specific patterns in conversation topics, users behaviors, overall trends and influencers, sociodemographic characteristics, lifestyle behaviors, and social and cultural constructs. Conclusion: The key to fostering big data and social media converge is process and analyze the right data that may be mined for purposes of public health, so as to provide strategic insights for planning, execution and measurement of effective and efficient public health interventions. In this effort, political, economic and legal obstacles need to be seriously considered.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Yusuf A. Adenle ◽  
Mohammed Abdul-Rahman ◽  
Oluwole A. Soyinka

Purpose As one of the buzzwords in the present age with considerable impacts in tertiary institutions, social media use in online teaching, learning and information dissemination have been extensively discussed in extant literature. This paper aims to explore the existing campus sustainability appraisal (CSA) tools to identify the length at which social media has been used, especially in environmental sustainability indicators’ selection and empirical verification. Design/methodology/approach The methodology is mainly based on a desktop study involving comprehensive review and content analysis of existing CSA tools’ documents. Webpage content analysis of selected sustainability monitoring and tracking system in higher education institutions was also conducted. Findings The tools' content analysis reveals insufficient utilization of social media data and platforms in campus sustainability environmental-dimension indicators selection. To bridge this identified research gap, social media user-generated content for appraising the campus-wide environmental sustainability indicators preference in tertiary institutions was proposed. Practical implications The adoption and modification of this study’s proposed approach by tertiary institutions, especially in sub-Saharan African countries, could help address most campus-wide environmental challenges raised, commented on and discussed on social media. Originality/value This study contributes to knowledge gaps by revealing the extent of social media utilization in extant tools. With the expanding utilization of different social media platforms by various tertiary institutions worldwide, their administrators' responsibility is to put these social media data into fair use.


Author(s):  
Dodo Zaenal Abidin ◽  
Siti Nurmaini ◽  
Reza Firsandaya Malik ◽  
Jasmir ◽  
Errissya Rasywir ◽  
...  

Author(s):  
Jiawei Li ◽  
Qing Xu ◽  
Raphael Cuomo ◽  
Vidya Purushothaman ◽  
Tim Mackey

BACKGROUND The coronavirus disease (COVID-19) pandemic, which began in Wuhan, China in December 2019, is rapidly spreading worldwide with over 1.9 million cases as of mid-April 2020. Infoveillance approaches using social media can help characterize disease distribution and public knowledge, attitudes, and behaviors critical to the early stages of an outbreak. OBJECTIVE The aim of this study is to conduct a quantitative and qualitative assessment of Chinese social media posts originating in Wuhan City on the Chinese microblogging platform Weibo during the early stages of the COVID-19 outbreak. METHODS Chinese-language messages from Wuhan were collected for 39 days between December 23, 2019, and January 30, 2020, on Weibo. For quantitative analysis, the total daily cases of COVID-19 in Wuhan were obtained from the Chinese National Health Commission, and a linear regression model was used to determine if Weibo COVID-19 posts were predictive of the number of cases reported. Qualitative content analysis and an inductive manual coding approach were used to identify parent classifications of news and user-generated COVID-19 topics. RESULTS A total of 115,299 Weibo posts were collected during the study time frame consisting of an average of 2956 posts per day (minimum 0, maximum 13,587). Quantitative analysis found a positive correlation between the number of Weibo posts and the number of reported cases from Wuhan, with approximately 10 more COVID-19 cases per 40 social media posts (<i>P</i>&lt;.001). This effect size was also larger than what was observed for the rest of China excluding Hubei Province (where Wuhan is the capital city) and held when comparing the number of Weibo posts to the incidence proportion of cases in Hubei Province. Qualitative analysis of 11,893 posts during the first 21 days of the study period with COVID-19-related posts uncovered four parent classifications including Weibo discussions about the causative agent of the disease, changing epidemiological characteristics of the outbreak, public reaction to outbreak control and response measures, and other topics. Generally, these themes also exhibited public uncertainty and changing knowledge and attitudes about COVID-19, including posts exhibiting both protective and higher-risk behaviors. CONCLUSIONS The results of this study provide initial insight into the origins of the COVID-19 outbreak based on quantitative and qualitative analysis of Chinese social media data at the initial epicenter in Wuhan City. Future studies should continue to explore the utility of social media data to predict COVID-19 disease severity, measure public reaction and behavior, and evaluate effectiveness of outbreak communication.


Sign in / Sign up

Export Citation Format

Share Document