Mining Social Media Data for Biomedical Signals and Health-Related Behavior

Social media data have been increasingly used to study biomedical and health-related phenomena. From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance and sentiment analysis, especially for mental health. We also discuss a variety of innovative uses of social media data for health-related applications as well as important limitations of social media data access and use.

Download Full-text

Towards Approximating Population-Level Mental Health in Thailand Using Large-Scale Social Media Data

10.1007/978-3-030-91669-5_26 ◽

2021 ◽

pp. 334-343

Author(s):

Krittin Chatrinan ◽

Anon Kangpanich ◽

Tanawin Wichit ◽

Thanapon Noraset ◽

Suppawong Tuarob ◽

...

Keyword(s):

Mental Health ◽

Social Media ◽

Large Scale ◽

Population Level ◽

Social Media Data ◽

Media Data

Download Full-text

Multitask Learning for Mental Health Conditions with Limited Social Media Data

10.18653/v1/e17-1015 ◽

2017 ◽

Cited By ~ 24

Author(s):

Adrian Benton ◽

Margaret Mitchell ◽

Dirk Hovy

Keyword(s):

Mental Health ◽

Social Media ◽

Multitask Learning ◽

Health Conditions ◽

Social Media Data ◽

Mental Health Conditions ◽

Media Data

Download Full-text

Automatic gender detection in Twitter profiles for health-related cohort studies

JAMIA Open ◽

10.1093/jamiaopen/ooab042 ◽

2021 ◽

Vol 4 (2) ◽

Author(s):

Yuan-Chi Yang ◽

Mohammed Ali Al-Garadi ◽

Jennifer S Love ◽

Jeanmarie Perrone ◽

Abeed Sarker

Keyword(s):

Social Media ◽

Confidence Interval ◽

Population Level ◽

Machine Learning Algorithms ◽

Support Vector ◽

Social Media Data ◽

Health Related ◽

Twitter Users ◽

Gender Detection ◽

Media Data

Abstract Objective Biomedical research involving social media data is gradually moving from population-level to targeted, cohort-level data analysis. Though crucial for biomedical studies, social media user’s demographic information (eg, gender) is often not explicitly known from profiles. Here, we present an automatic gender classification system for social media and we illustrate how gender information can be incorporated into a social media-based health-related study. Materials and Methods We used a large Twitter dataset composed of public, gender-labeled users (Dataset-1) for training and evaluating the gender detection pipeline. We experimented with machine learning algorithms including support vector machines (SVMs) and deep-learning models, and public packages including M3. We considered users’ information including profile and tweets for classification. We also developed a meta-classifier ensemble that strategically uses the predicted scores from the classifiers. We then applied the best-performing pipeline to Twitter users who have self-reported nonmedical use of prescription medications (Dataset-2) to assess the system’s utility. Results and Discussion We collected 67 181 and 176 683 users for Dataset-1 and Dataset-2, respectively. A meta-classifier involving SVM and M3 performed the best (Dataset-1 accuracy: 94.4% [95% confidence interval: 94.0–94.8%]; Dataset-2: 94.4% [95% confidence interval: 92.0–96.6%]). Including automatically classified information in the analyses of Dataset-2 revealed gender-specific trends—proportions of females closely resemble data from the National Survey of Drug Use and Health 2018 (tranquilizers: 0.50 vs 0.50; stimulants: 0.50 vs 0.45), and the overdose Emergency Room Visit due to Opioids by Nationwide Emergency Department Sample (pain relievers: 0.38 vs 0.37). Conclusion Our publicly available, automated gender detection pipeline may aid cohort-specific social media data analyses (https://bitbucket.org/sarkerlab/gender-detection-for-public).

Download Full-text

A Pipeline to Understand Emerging Illness Via Social Media Data Analysis: Case Study on Breast Implant Illness (Preprint)

10.2196/preprints.29768 ◽

2021 ◽

Author(s):

Vishal Dey ◽

Peter Krasniak ◽

Minh Nguyen ◽

Clara Lee ◽

Xia Ning

Keyword(s):

Mental Health ◽

Social Media ◽

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Language Processing ◽

Breast Implant ◽

Public Attention ◽

Social Media Data ◽

Media Data

BACKGROUND A new illness can come to public attention through social media before it is medically defined, formally documented, or systematically studied. One example is a condition known as breast implant illness (BII), which has been extensively discussed on social media, although it is vaguely defined in the medical literature. OBJECTIVE The objective of this study is to construct a data analysis pipeline to understand emerging illnesses using social media data and to apply the pipeline to understand the key attributes of BII. METHODS We constructed a pipeline of social media data analysis using natural language processing and topic modeling. Mentions related to signs, symptoms, diseases, disorders, and medical procedures were extracted from social media data using the clinical Text Analysis and Knowledge Extraction System. We mapped the mentions to standard medical concepts and then summarized these mapped concepts as topics using latent Dirichlet allocation. Finally, we applied this pipeline to understand BII from several BII-dedicated social media sites. RESULTS Our pipeline identified topics related to toxicity, cancer, and mental health issues that were highly associated with BII. Our pipeline also showed that cancers, autoimmune disorders, and mental health problems were emerging concerns associated with breast implants, based on social media discussions. Furthermore, the pipeline identified mentions such as rupture, infection, pain, and fatigue as common self-reported issues among the public, as well as concerns about toxicity from silicone implants. CONCLUSIONS Our study could inspire future studies on the suggested symptoms and factors of BII. Our study provides the first analysis and derived knowledge of BII from social media using natural language processing techniques and demonstrates the potential of using social media information to better understand similar emerging illnesses. CLINICALTRIAL

Download Full-text

Who’s Tweeting About the President? What Big Survey Data Can Tell Us About Digital Traces?

Social Science Computer Review ◽

10.1177/0894439318822007 ◽

2019 ◽

Vol 38 (5) ◽

pp. 633-650 ◽

Cited By ~ 2

Author(s):

Josh Pasek ◽

Colleen A. McClain ◽

Frank Newport ◽

Stephanie Marken

Keyword(s):

Social Media ◽

Survey Data ◽

Large Scale ◽

Presidential Approval ◽

Social Phenomena ◽

Social Media Data ◽

Complex Picture ◽

Large Scale Survey ◽

Media Data

Researchers hoping to make inferences about social phenomena using social media data need to answer two critical questions: What is it that a given social media metric tells us? And who does it tell us about? Drawing from prior work on these questions, we examine whether Twitter sentiment about Barack Obama tells us about Americans’ attitudes toward the president, the attitudes of particular subsets of individuals, or something else entirely. Specifically, using large-scale survey data, this study assesses how patterns of approval among population subgroups compare to tweets about the president. The findings paint a complex picture of the utility of digital traces. Although attention to subgroups improves the extent to which survey and Twitter data can yield similar conclusions, the results also indicate that sentiment surrounding tweets about the president is no proxy for presidential approval. Instead, after adjusting for demographics, these two metrics tell similar macroscale, long-term stories about presidential approval but very different stories at a more granular level and over shorter time periods.

Download Full-text

Insulin pricing and other major diabetes-related concerns in the USA: a study of 46 407 tweets between 2017 and 2019

BMJ Open Diabetes Research & Care ◽

10.1136/bmjdrc-2020-001190 ◽

2020 ◽

Vol 8 (1) ◽

pp. e001190

Author(s):

Adrian Ahne ◽

Francisco Orchard ◽

Xavier Tannier ◽

Camille Perchoux ◽

Beverley Balkau ◽

...

Keyword(s):

Social Media ◽

Positive Emotions ◽

Negative Impact ◽

Diabetes Distress ◽

Social Media Data ◽

Health Related ◽

The Usa ◽

Personal Content ◽

Design And Methods ◽

Media Data

IntroductionLittle research has been done to systematically evaluate concerns of people living with diabetes through social media, which has been a powerful tool for social change and to better understand perceptions around health-related issues. This study aims to identify key diabetes-related concerns in the USA and primary emotions associated with those concerns using information shared on Twitter.Research design and methodsA total of 11.7 million diabetes-related tweets in English were collected between April 2017 and July 2019. Machine learning methods were used to filter tweets with personal content, to geolocate (to the USA) and to identify clusters of tweets with emotional elements. A sentiment analysis was then applied to each cluster.ResultsWe identified 46 407 tweets with emotional elements in the USA from which 30 clusters were identified; 5 clusters (18% of tweets) were related to insulin pricing with both positive emotions (joy, love) referring to advocacy for affordable insulin and sadness emotions related to the frustration of insulin prices, 5 clusters (12% of tweets) to solidarity and support with a majority of joy and love emotions expressed. The most negative topics (10% of tweets) were related to diabetes distress (24% sadness, 27% anger, 21% fear elements), to diabetic and insulin shock (45% anger, 46% fear) and comorbidities (40% sadness).ConclusionsUsing social media data, we have been able to describe key diabetes-related concerns and their associated emotions. More specifically, we were able to highlight the real-world concerns of insulin pricing and its negative impact on mood. Using such data can be a useful addition to current measures that inform public decision making around topics of concern and burden among people with diabetes.

Download Full-text

Comprehensive scoping review of health research using social media data

BMJ Open ◽

10.1136/bmjopen-2018-022931 ◽

2018 ◽

Vol 8 (12) ◽

pp. e022931 ◽

Cited By ~ 3

Author(s):

Joanna Taylor ◽

Claudia Pagliari

Keyword(s):

Public Health ◽

Social Media ◽

Scoping Review ◽

Ethical Issues ◽

Social Media Data ◽

Related Research ◽

Health Related Research ◽

Health Related ◽

Google Search ◽

Media Data

IntroductionThe rising popularity of social media, since their inception around 20 years ago, has been echoed in the growth of health-related research using data derived from them. This has created a demand for literature reviews to synthesise this emerging evidence base and inform future activities. Existing reviews tend to be narrow in scope, with limited consideration of the different types of data, analytical methods and ethical issues involved. There has also been a tendency for research to be siloed within different academic communities (eg, computer science, public health), hindering knowledge translation. To address these limitations, we will undertake a comprehensive scoping review, to systematically capture the broad corpus of published, health-related research based on social media data. Here, we present the review protocol and the pilot analyses used to inform it.MethodsA version of Arksey and O’Malley’s five-stage scoping review framework will be followed: (1) identifying the research question; (2) identifying the relevant literature; (3) selecting the studies; (4) charting the data and (5) collating, summarising and reporting the results. To inform the search strategy, we developed an inclusive list of keyword combinations related to social media, health and relevant methodologies. The frequency and variability of terms were charted over time and cross referenced with significant events, such as the advent of Twitter. Five leading health, informatics, business and cross-disciplinary databases will be searched: PubMed, Scopus, Association of Computer Machinery, Institute of Electrical and Electronics Engineers and Applied Social Sciences Index and Abstracts, alongside the Google search engine. There will be no restriction by date.Ethics and disseminationThe review focuses on published research in the public domain therefore no ethics approval is required. The completed review will be submitted for publication to a peer-reviewed, interdisciplinary open access journal, and conferences on public health and digital research.

Download Full-text

Participatory Social Sensor: A Framework to Social Media Data Acquisition and Analysis

10.5753/sbrc_estendido.2019.7765 ◽

2019 ◽

Author(s):

Igor Araujo ◽

Paulo Henrique Lopes Rettore ◽

João Guilherme Maia de Menezes

Keyword(s):

Social Media ◽

Data Acquisition ◽

Visual Analysis ◽

Social Behaviors ◽

Data Access ◽

Heterogeneous Data ◽

Social Media Data ◽

Analysis Process ◽

Media Data ◽

Social Sensor

Nowadays, understanding urban mobility, transit, people viewpoint, and social behaviors has been the focus of many research and investments. However, data access is restricted to private companies and governments. In addition, the costs to create a sensor infrastructure on a given area is prohibitive. Then, using Location-Based Social Media (LBSM) may provide a new way to better comprehend the social behaviors, by the use of a users viewpoint. In this work, we propose the use of LBSM as participatory sensing, designing the Participatory Social Sensor (PSS), a friendly framework to social media data acquisition and analysis. We develop the Twitter data acquisition and analysis process, aiming to achieve the user application goals through a file setup,where the user specifies the spatial area, temporal interval, tags, and other parameters. As a result, the PSS shows a set of visual analysis which provides a context overview, allowing an easy way to researchers make-decision. A case study, Detection and Enrichment Service for Road Events Based on Heterogeneous Data Merger for VANETs, based on PSS framework was published in the current conference.

Download Full-text

Mental Health Consultations on College Campuses: Examining the Predictive Ability of Social Media

10.21203/rs.3.rs-196605/v1 ◽

2021 ◽

Author(s):

Koustuv Saha ◽

Asra Yousuf ◽

Ryan L. Boyd ◽

James W. Pennebaker ◽

Munmun Choudhury

Keyword(s):

Mental Health ◽

College Students ◽

Social Media ◽

Collective Identity ◽

Ground Truth ◽

Treatment Needs ◽

Ground Truth Data ◽

Social Media Data ◽

Mental Health Consultations ◽

Media Data

Abstract The mental health of college students is a growing concern, and gauging the mental health needs of college students is difficult to assess in real-time and in scale. While social media has shown potential as a viable “passive sensor” of mental health, the construct validity and in-practice reliability of such computational assessments remain largely unexplored. Towards this goal, we study how assessing the mental health of college students using social media data correspond with ground-truth data of on-campus mental health consultations. For a large U.S. public university, we obtained ground-truth data of on-campus mental health consultations between 2011–2016, and collected 66,000 posts from the university’s Reddit community. We adopted machine learning and natural language methodologies to measure symptomatic mental health expressions of depression, anxiety, stress, suicidal ideation, and psychosis on the social media data. Seasonal auto-regressive integrated moving average (SARIMA) models of forecasting on-campus mental health consultations showed that incorporating social media data led to predictions with r=0.86 and SMAPE=13.30, outperforming models without social media data by 41%. Our language analyses revealed that social media discussions during high mental health consultations months consisted of discussions on academics and career, whereas months of low mental health consultations saliently show expressions of positive affect, collective identity, and socialization. This study reveals that social media data can improve our understanding of college students’ mental health, particularly their mental health treatment needs.

Download Full-text

Social Media Reveals Psychosocial Effects of the COVID-19 Pandemic

10.1101/2020.08.07.20170548 ◽

2020 ◽

Author(s):

Koustuv Saha ◽

John Torous ◽

Eric D. Caine ◽

Munmun De Choudhury

Keyword(s):

Mental Health ◽

Social Media ◽

Psychosocial Effects ◽

Self Disclosure ◽

Social Media Data ◽

Health Concerns ◽

Mental Health Concerns ◽

Precautionary Measures ◽

Media Data ◽

Over Time

AbstractBackgroundThe novel coronavirus disease 2019 (COVID-19) pandemic has caused several disruptions in personal and collective lives worldwide. The uncertainties surrounding the pandemic have also led to multi-faceted mental health concerns, which can be exacerbated with precautionary measures such as social distancing and self-quarantining, as well as societal impacts such as economic downturn and job loss. Despite noting this as a “mental health tsunami,” the psychological effects of the COVID-19 crisis remains unexplored at scale. Consequently, public health stakeholders are currently limited in identifying ways to provide timely and tailored support during these circumstances.ObjectiveOur work aims to provide insights regarding people’s psychosocial concerns during the COVID-19 pandemic by leveraging social media data. We aim to study the temporal and linguistic changes in symptomatic mental health and support expressions in the pandemic context.MethodsWe obtain ∼60M Twitter streaming posts originating from the U.S. from 24 March-24 May 2020, and compare these with ∼40M posts from a comparable period in 2019 to attribute the effect of COVID-19 on people’s social media self-disclosure. Using these datasets, we study people’s self-disclosure on social media in terms of symptomatic mental health concerns and expressions of support. We employ transfer learning classifiers that identify the social media language indicative of mental health outcomes (anxiety, depression, stress, and suicidal ideation) and support (emotional and informational support). We then examine the changes in psychosocial expressions over time and language, comparing the 2020 and 2019 datasets.ResultsWe find that all of the examined psychosocial expressions have significantly increased during the COVID-19 crisis – mental health symptomatic expressions have increased by ∼14%, and support expressions have increased by ∼5%, both thematically related to COVID-19. We also observe a steady decline and eventual plateauing in these expressions during the COVID-19 pandemic, which may have been due to habituation or due to supportive policy measures enacted during this period. Our language analyses highlight that people express concerns that are very specific to and contextually related to the COVID-19 crisis.ConclusionsWe studied the psychosocial effects of the COVID-19 crisis by using social media data from 2020, finding that people’s mental health symptomatic and support expressions significantly increased during the COVID-19 period as compared to similar data from 2019. However, this effect gradually lessened over time, suggesting that people adapted to the circumstances and their “new normal”. Our linguistic analyses revealed that people expressed mental health concerns regarding personal and professional challenges, healthcare and precautionary measures, and pandemic-related awareness. This work shows the potential to provide insights to mental healthcare and stakeholders and policymakers in planning and implementing measures to mitigate mental health risks amidst the health crisis.

Download Full-text