scholarly journals Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study

10.2196/14986 ◽  
2020 ◽  
Vol 6 (2) ◽  
pp. e14986 ◽  
Author(s):  
Ashlynn R Daughton ◽  
Rumi Chunara ◽  
Michael J Paul

Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

2021 ◽  
Author(s):  
Koustuv Saha ◽  
Asra Yousuf ◽  
Ryan L. Boyd ◽  
James W. Pennebaker ◽  
Munmun Choudhury

Abstract The mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. While social media has shown potential as a viable “passive sensor” of mental health, the construct validity and in-practice reliability of such computational assessments remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011–2016, and collected 66,000 posts from the university’s Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r=0.86 and SMAPE=13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students’ mental health, particularly their mental health treatment needs.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Koustuv Saha ◽  
Asra Yousuf ◽  
Ryan L. Boyd ◽  
James W. Pennebaker ◽  
Munmun De Choudhury

AbstractThe mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. To address this gap, researchers and practitioners have encouraged the use of passive technologies. Social media is one such "passive sensor" that has shown potential as a viable "passive sensor" of mental health. However, the construct validity and in-practice reliability of computational assessments of mental health constructs with social media data remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011–2016, and collected 66,000 posts from the university’s Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r = 0.86 and SMAPE = 13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students’ mental health, particularly their mental health treatment needs.


2021 ◽  
Author(s):  
Koustuv Saha ◽  
Asra Yousuf ◽  
Ryan Boyd ◽  
James Pennebaker ◽  
Munmun De Choudhury

Abstract The mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. While social media has shown potential as a viable "passive sensor" of mental health, the construct validity and in-practice reliability of such computational assessments remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011-2016, and collected 66,000 posts from the university's Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r=0.86 and SMAPE=13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students' mental health, particularly their mental health treatment needs.


2021 ◽  
Vol 10 (7) ◽  
pp. 474
Author(s):  
Bingqing Wang ◽  
Bin Meng ◽  
Juan Wang ◽  
Siyu Chen ◽  
Jian Liu

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.


2017 ◽  
Vol 99 ◽  
pp. 18-29 ◽  
Author(s):  
Marianela García Lozano ◽  
Jonah Schreiber ◽  
Joel Brynielsson

2018 ◽  
Vol 7 (12) ◽  
pp. 481
Author(s):  
Zhewei Liu ◽  
Xiaolin Zhou ◽  
Wenzhong Shi ◽  
Anshu Zhang

Detecting events using social media data is important for timely emergency response and urban monitoring. Current studies primarily use semantic-based methods, in which “bursts” of certain semantic signals are detected to identify emerging events. Nevertheless, our consideration is that a social event will not only affect semantic signals but also cause irregular human mobility patterns. By introducing depictive features, such irregular patterns can be used for event detection. Consequently, in this paper, we develop a novel, comprehensive workflow for event detection by mining the geographical patterns of VGI. This workflow first uses data geographical topic modeling to detect the hashtag communities with VGI semantic data. Both global and local indicators are then constructed by introducing spatial autocorrelation measurements. We then adopt an outlier test and generate indicator maps to spatiotemporally identify the potential social events. This workflow was implemented using a real-world dataset (104,000 geo-tagged photos) and the evaluation was conducted both qualitatively and quantitatively. A set of experiments showed that the discovered semantic communities were internally consistent and externally differentiable, and the plausibility of the detected events was demonstrated by referring to the available ground truth. This study examined the feasibility of detecting events by investigating the geographical patterns of social media data and can be applied to urban knowledge retrieval.


2015 ◽  
Vol 109 (1) ◽  
pp. 62-78 ◽  
Author(s):  
ROBERT BOND ◽  
SOLOMON MESSING

We demonstrate that social media data represent a useful resource for testing models of legislative and individual-level political behavior and attitudes. First, we develop a model to estimate the ideology of politicians and their supporters using social media data on individual citizens’ endorsements of political figures. Our measure allows us to place politicians and more than 6 million citizens who are active in social media on the same metric. We validate the ideological estimates that result from the scaling process by showing they correlate highly with existing measures of ideology from Congress, and with individual-level self-reported political views. Finally, we use these measures to study the relationship between ideology and age, social relationships and ideology, and the relationship between friend ideology and turnout.


Sign in / Sign up

Export Citation Format

Share Document