Comparison of Social Media, Syndromic Surveillance, and Microbiologic Acute Respiratory Infection Data: Observational Study

Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either. The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.

Download Full-text

Mental Health Consultations on College Campuses: Examining the Predictive Ability of Social Media

10.21203/rs.3.rs-196605/v1 ◽

2021 ◽

Author(s):

Koustuv Saha ◽

Asra Yousuf ◽

Ryan L. Boyd ◽

James W. Pennebaker ◽

Munmun Choudhury

Keyword(s):

Mental Health ◽

College Students ◽

Social Media ◽

Collective Identity ◽

Ground Truth ◽

Treatment Needs ◽

Ground Truth Data ◽

Social Media Data ◽

Mental Health Consultations ◽

Media Data

Abstract The mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. While social media has shown potential as a viable “passive sensor” of mental health, the construct validity and in-practice reliability of such computational assessments remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011–2016, and collected 66,000 posts from the university’s Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r=0.86 and SMAPE=13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students’ mental health, particularly their mental health treatment needs.

Download Full-text

Social Media Discussions Predict Mental Health Consultations on College Campuses

Scientific Reports ◽

10.1038/s41598-021-03423-4 ◽

2022 ◽

Vol 12 (1) ◽

Author(s):

Koustuv Saha ◽

Asra Yousuf ◽

Ryan L. Boyd ◽

James W. Pennebaker ◽

Munmun De Choudhury

Keyword(s):

Mental Health ◽

College Students ◽

Social Media ◽

Ground Truth ◽

Treatment Needs ◽

Ground Truth Data ◽

Social Media Data ◽

Passive Sensor ◽

Mental Health Consultations ◽

Media Data

AbstractThe mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. To address this gap, researchers and practitioners have encouraged the use of passive technologies. Social media is one such "passive sensor" that has shown potential as a viable "passive sensor" of mental health. However, the construct validity and in-practice reliability of computational assessments of mental health constructs with social media data remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011–2016, and collected 66,000 posts from the university’s Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r = 0.86 and SMAPE = 13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students’ mental health, particularly their mental health treatment needs.

Download Full-text

Mental Health Consultations on College Campuses: Examining the Predictive Ability of Social Media

10.21203/rs.3.rs-162266/v1 ◽

2021 ◽

Author(s):

Koustuv Saha ◽

Asra Yousuf ◽

Ryan Boyd ◽

James Pennebaker ◽

Munmun De Choudhury

Keyword(s):

Mental Health ◽

College Students ◽

Social Media ◽

Collective Identity ◽

Ground Truth ◽

Treatment Needs ◽

Ground Truth Data ◽

Social Media Data ◽

Mental Health Consultations ◽

Media Data

Abstract The mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. While social media has shown potential as a viable "passive sensor" of mental health, the construct validity and in-practice reliability of such computational assessments remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011-2016, and collected 66,000 posts from the university's Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r=0.86 and SMAPE=13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students' mental health, particularly their mental health treatment needs.

Download Full-text

Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10070474 ◽

2021 ◽

Vol 10 (7) ◽

pp. 474

Author(s):

Bingqing Wang ◽

Bin Meng ◽

Juan Wang ◽

Siyu Chen ◽

Jian Liu

Keyword(s):

Social Media ◽

Language Processing ◽

Topic Model ◽

Central Area ◽

Classification Model ◽

Social Media Data ◽

Ring Road ◽

Different Types ◽

Spatial Differences ◽

Media Data

Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reflect some aspects of the behavior of residents. In this study, a text classification model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture significantly influences residents’ festivals, reflecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application field of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.

Download Full-text

Creating full individual-level location timelines from sparse social media data

Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems - SIGSPATIAL '18 ◽

10.1145/3274895.3274982 ◽

2018 ◽

Cited By ~ 1

Author(s):

Nabeel Abdur Rehman ◽

Kunal Relia ◽

Rumi Chunara

Keyword(s):

Social Media ◽

Social Media Data ◽

Individual Level ◽

Media Data

Download Full-text

Tracking geographical locations using a geo-aware topic model for analyzing social media data

Decision Support Systems ◽

10.1016/j.dss.2017.05.006 ◽

2017 ◽

Vol 99 ◽

pp. 18-29 ◽

Cited By ~ 14

Author(s):

Marianela García Lozano ◽

Jonah Schreiber ◽

Joel Brynielsson

Keyword(s):

Social Media ◽

Topic Model ◽

Social Media Data ◽

Media Data ◽

Geographical Locations

Download Full-text

Probabilistic topic model based approach for detecting bursty events from social media data

2017 International Conference on Security, Pattern Analysis, and Cybernetics (SPAC) ◽

10.1109/spac.2017.8304365 ◽

2017 ◽

Cited By ~ 1

Author(s):

Chunshan Li ◽

Dianhui Chu

Keyword(s):

Social Media ◽

Topic Model ◽

Social Media Data ◽

Model Based ◽

Probabilistic Topic Model ◽

Media Data

Download Full-text

A topic model based framework for identifying the distribution of demand for relief supplies using social media data

International Journal of Geographical Information Science ◽

10.1080/13658816.2020.1869746 ◽

2021 ◽

pp. 1-22

Author(s):

Ting Zhang ◽

Shi Shen ◽

Changxiu Cheng ◽

Kai Su ◽

Xiangxue Zhang

Keyword(s):

Social Media ◽

Topic Model ◽

Social Media Data ◽

Model Based ◽

Media Data

Download Full-text

Towards Detecting Social Events by Mining Geographical Patterns with VGI Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi7120481 ◽

2018 ◽

Vol 7 (12) ◽

pp. 481

Author(s):

Zhewei Liu ◽

Xiaolin Zhou ◽

Wenzhong Shi ◽

Anshu Zhang

Keyword(s):

Social Media ◽

Event Detection ◽

Human Mobility ◽

Ground Truth ◽

Mobility Patterns ◽

Social Media Data ◽

Geographical Patterns ◽

Outlier Test ◽

Social Events ◽

Media Data

Detecting events using social media data is important for timely emergency response and urban monitoring. Current studies primarily use semantic-based methods, in which “bursts” of certain semantic signals are detected to identify emerging events. Nevertheless, our consideration is that a social event will not only affect semantic signals but also cause irregular human mobility patterns. By introducing depictive features, such irregular patterns can be used for event detection. Consequently, in this paper, we develop a novel, comprehensive workflow for event detection by mining the geographical patterns of VGI. This workflow first uses data geographical topic modeling to detect the hashtag communities with VGI semantic data. Both global and local indicators are then constructed by introducing spatial autocorrelation measurements. We then adopt an outlier test and generate indicator maps to spatiotemporally identify the potential social events. This workflow was implemented using a real-world dataset (104,000 geo-tagged photos) and the evaluation was conducted both qualitatively and quantitatively. A set of experiments showed that the discovered semantic communities were internally consistent and externally differentiable, and the plausibility of the detected events was demonstrated by referring to the available ground truth. This study examined the feasibility of detecting events by investigating the geographical patterns of social media data and can be applied to urban knowledge retrieval.

Download Full-text

Quantifying Social Media’s Political Space: Estimating Ideology from Publicly Revealed Preferences on Facebook

American Political Science Review ◽

10.1017/s0003055414000525 ◽

2015 ◽

Vol 109 (1) ◽

pp. 62-78 ◽

Cited By ~ 73

Author(s):

ROBERT BOND ◽

SOLOMON MESSING

Keyword(s):

Social Media ◽

Social Relationships ◽

Political Behavior ◽

Revealed Preferences ◽

Social Media Data ◽

Individual Level ◽

Political Views ◽

Scaling Process ◽

The Relationship ◽

Media Data

We demonstrate that social media data represent a useful resource for testing models of legislative and individual-level political behavior and attitudes. First, we develop a model to estimate the ideology of politicians and their supporters using social media data on individual citizens’ endorsements of political figures. Our measure allows us to place politicians and more than 6 million citizens who are active in social media on the same metric. We validate the ideological estimates that result from the scaling process by showing they correlate highly with existing measures of ideology from Congress, and with individual-level self-reported political views. Finally, we use these measures to study the relationship between ideology and age, social relationships and ideology, and the relationship between friend ideology and turnout.

Download Full-text