scholarly journals Predicting Opioid Overdose Crude Rates with Text-Based Twitter Features (Student Abstract)

2020 ◽  
Vol 34 (10) ◽  
pp. 13787-13788
Author(s):  
Nupoor Gandhi ◽  
Alex Morales ◽  
Sally Man-Pui Chan ◽  
Dolores Albarracin ◽  
ChengXiang Zhai

Drug use reporting is often a bottleneck for modern public health surveillance; social media data provides a real-time signal which allows for tracking and monitoring opioid overdoses. In this work we focus on text-based feature construction for the prediction task of opioid overdose rates at the county level. More specifically, using a Twitter dataset with over 3.4 billion tweets, we explore semantic features, such as topic features, to show that social media could be a good indicator for forecasting opioid overdose crude rates in public health monitoring systems. Specifically, combining topic and TF-IDF features in conjunction with demographic features can predict opioid overdose rates at the county level.

2021 ◽  
Author(s):  
Olivia Figueira ◽  
Yuka Hatori ◽  
Liying Liang ◽  
Christine Chye ◽  
Yuhong Liu

BMJ Open ◽  
2018 ◽  
Vol 8 (12) ◽  
pp. e022931 ◽  
Author(s):  
Joanna Taylor ◽  
Claudia Pagliari

IntroductionThe rising popularity of social media, since their inception around 20 years ago, has been echoed in the growth of health-related research using data derived from them. This has created a demand for literature reviews to synthesise this emerging evidence base and inform future activities. Existing reviews tend to be narrow in scope, with limited consideration of the different types of data, analytical methods and ethical issues involved. There has also been a tendency for research to be siloed within different academic communities (eg, computer science, public health), hindering knowledge translation. To address these limitations, we will undertake a comprehensive scoping review, to systematically capture the broad corpus of published, health-related research based on social media data. Here, we present the review protocol and the pilot analyses used to inform it.MethodsA version of Arksey and O’Malley’s five-stage scoping review framework will be followed: (1) identifying the research question; (2) identifying the relevant literature; (3) selecting the studies; (4) charting the data and (5) collating, summarising and reporting the results. To inform the search strategy, we developed an inclusive list of keyword combinations related to social media, health and relevant methodologies. The frequency and variability of terms were charted over time and cross referenced with significant events, such as the advent of Twitter. Five leading health, informatics, business and cross-disciplinary databases will be searched: PubMed, Scopus, Association of Computer Machinery, Institute of Electrical and Electronics Engineers and Applied Social Sciences Index and Abstracts, alongside the Google search engine. There will be no restriction by date.Ethics and disseminationThe review focuses on published research in the public domain therefore no ethics approval is required. The completed review will be submitted for publication to a peer-reviewed, interdisciplinary open access journal, and conferences on public health and digital research.


2018 ◽  
Vol 45 (1) ◽  
pp. 136-136

Ji X, Chun SA, Cappellari P, et al. Linking and using social media data for enhancing public health analytics. Journal of Information Science 2016; 43: 221–245. DOI: 10.1177/0165551515625029 The authors regret that non-anonymised patient data was used from a social medical network without prior permission. With permission from the social medical network, the authors have anonymised the data and corrected the article. The online version of the article has been corrected.


10.2196/13038 ◽  
2019 ◽  
Vol 21 (9) ◽  
pp. e13038 ◽  
Author(s):  
Yongcheng Zhan ◽  
Zhu Zhang ◽  
Janet M Okamoto ◽  
Daniel D Zeng ◽  
Scott J Leischow

Background The popularity of JUUL (an e-cigarette brand) among youth has recently been reported in news media and academic papers, which has raised great public health concerns. Little research has been conducted on the age distribution, geographic distribution, approaches to buying JUUL, and flavor preferences pertaining to underage JUUL users. Objective The aim of this study was to analyze social media data related to demographics, methods of access, product characteristics, and use patterns of underage JUUL use. Methods We collected publicly available JUUL-related data from Reddit. We extracted and summarized the age, location, and flavor preference of subreddit UnderageJuul users. We also compared common and unique users between subreddit UnderageJuul and subreddit JUUL. The methods of purchasing JUULs were analyzed by manually examining the content of the Reddit threads. Results A total of 716 threads and 2935 comments were collected from the subreddit UnderageJuul before it was shut down. Most threads did not mention a specific age, but ages ranged from 13 years to greater than 21 years in those that did. Mango, mint, and cucumber were the most popular among the 7 flavors listed on JUUL’s official website, and 336 subreddit UnderageJuul threads mentioned 7 discreet approaches to circumvent relevant legal regulations to get JUUL products, the most common of which was purchasing JUUL from other Reddit users (n=181). Almost half of the UnderageJuul users (389/844, 46.1%) also participated in discussions on the main JUUL subreddit and sought information across multiple Reddit forums. Most (64/74, 86%) posters were from large metropolitan areas. Conclusions The subreddit UnderageJuul functioned as a forum to explore methods of obtaining JUUL and to discuss and recommend specific flavors before it was shut down. About half of those using UnderageJuul also used the more general JUUL subreddit, so a forum still exists where youths can attempt to share information on how to obtain JUUL and other products. Exploration of such social media data in real time for rapid public health surveillance could provide early warning for significant health risks before they become major public health threats.


2020 ◽  
Author(s):  
Stevie Chancellor ◽  
Steven A Sumner ◽  
Corinne David-Ferdon ◽  
Tahirah Ahmad ◽  
Munmun De Choudhury

BACKGROUND Online communities provide support for individuals looking for help with suicidal ideation and crisis. As community data are increasingly used to devise machine learning models to infer who might be at risk, there have been limited efforts to identify both risk and protective factors in web-based posts. These annotations can enrich and augment computational assessment approaches to identify appropriate intervention points, which are useful to public health professionals and suicide prevention researchers. OBJECTIVE This qualitative study aims to develop a valid and reliable annotation scheme for evaluating risk and protective factors for suicidal ideation in posts in suicide crisis forums. METHODS We designed a valid, reliable, and clinically grounded process for identifying risk and protective markers in social media data. This scheme draws on prior work on construct validity and the social sciences of measurement. We then applied the scheme to annotate 200 posts from r/SuicideWatch—a Reddit community focused on suicide crisis. RESULTS We documented our results on producing an annotation scheme that is consistent with leading public health information coding schemes for suicide and advances attention to protective factors. Our study showed high internal validity, and we have presented results that indicate that our approach is consistent with findings from prior work. CONCLUSIONS Our work formalizes a framework that incorporates construct validity into the development of annotation schemes for suicide risk on social media. This study furthers the understanding of risk and protective factors expressed in social media data. This may help public health programming to prevent suicide and computational social science research and investigations that rely on the quality of labels for downstream machine learning tasks.


10.2196/24471 ◽  
2021 ◽  
Vol 8 (11) ◽  
pp. e24471
Author(s):  
Stevie Chancellor ◽  
Steven A Sumner ◽  
Corinne David-Ferdon ◽  
Tahirah Ahmad ◽  
Munmun De Choudhury

Background Online communities provide support for individuals looking for help with suicidal ideation and crisis. As community data are increasingly used to devise machine learning models to infer who might be at risk, there have been limited efforts to identify both risk and protective factors in web-based posts. These annotations can enrich and augment computational assessment approaches to identify appropriate intervention points, which are useful to public health professionals and suicide prevention researchers. Objective This qualitative study aims to develop a valid and reliable annotation scheme for evaluating risk and protective factors for suicidal ideation in posts in suicide crisis forums. Methods We designed a valid, reliable, and clinically grounded process for identifying risk and protective markers in social media data. This scheme draws on prior work on construct validity and the social sciences of measurement. We then applied the scheme to annotate 200 posts from r/SuicideWatch—a Reddit community focused on suicide crisis. Results We documented our results on producing an annotation scheme that is consistent with leading public health information coding schemes for suicide and advances attention to protective factors. Our study showed high internal validity, and we have presented results that indicate that our approach is consistent with findings from prior work. Conclusions Our work formalizes a framework that incorporates construct validity into the development of annotation schemes for suicide risk on social media. This study furthers the understanding of risk and protective factors expressed in social media data. This may help public health programming to prevent suicide and computational social science research and investigations that rely on the quality of labels for downstream machine learning tasks.


2015 ◽  
Author(s):  
Evika Karamagioli

Background: As the use of social media creates huge amounts of data, the need for big data analysis has to synthesize the information and determine which actions is generated. Online communication channels such as Facebook, Twitter, Instagram etc provide a wealth of passively collected data that may be mined for public health purposes such as health surveillance, health crisis management, and last but not least health promotion and education. Objective: We explore international bibliography on the potential role and perceptive of use for social media as a big data source for public health purposes. Method: Systematic literature review. Data extraction and synthesis was performed with the use of thematic analysis. Results: Examples of those currently collecting and analyzing big data from generated social content include scientists who are working with the Centers for Disease Control and Prevention to track the spread of flu by analyzing what user searches, and the World Health Organization is working on disaster management relief. But what exactly do we do with this big social media data? We can track real-time trends and understand them quicker through the platforms and processing services. By processing this big social media data, it is possible to determine specific patterns in conversation topics, users behaviors, overall trends and influencers, sociodemographic characteristics, lifestyle behaviors, and social and cultural constructs. Conclusion: The key to fostering big data and social media converge is process and analyze the right data that may be mined for purposes of public health, so as to provide strategic insights for planning, execution and measurement of effective and efficient public health interventions. In this effort, political, economic and legal obstacles need to be seriously considered.


2021 ◽  
Author(s):  
Zahra Shakeri Hossein Abad ◽  
Gregory P. Butler ◽  
Wendy Thompson ◽  
Joon Lee

BACKGROUND Advances in automated data processing and machine learning (ML) models, together with the unprecedented growth in the number of social media users who publicly share and discuss health-related information, have made public health surveillance (PHS) one of the long-lasting social media applications. However, the existing PHS systems feeding on social media data have not been widely deployed in national surveillance systems, which appears to stem from the lack of practitioners and the public’s trust in social media data. More robust and reliable datasets over which supervised machine learning models can be trained and tested reliably is a significant step toward overcoming this hurdle. OBJECTIVE The health implications of daily behaviours (physical activity, sedentary behaviour, and sleep (PASS)), as an evergreen topic in PHS, are widely studied through traditional data sources such as surveillance surveys and administrative databases, which are often several months out of date by the time they are utilized, costly to collect, and thus limited in quantity and coverage. In this paper, we present LPHEADA, a multicountry and fully Labelled digital Public HEAlth DAtaset of tweets originated in Australia, Canada, the United Kingdom (UK), or the United States (US). METHODS We collected the data of this study from Twitter using the Twitter livestream application programming interface (API) between 28th November 2018 to 19th June 2020. To obtain PASS-related tweets for manual annotation, we iteratively used regular expressions, unsupervised natural language processing, domain-specific ontologies and linguistic analysis. We used Amazon Mechanical Turk (AMT) to label the collected data to self-reported PASS categories and implemented a quality control pipeline to monitor and manage the validity of crow-generated labels. Moreover, we used ML, latent semantic analysis, linguistic analysis, and label inference analysis to validate different components of the dataset. RESULTS LPHEADA contains 366,405 crowd-generated labels (three labels per tweet) for 122,135 PASS-related tweets, labelled by 708 unique annotators on AMT. In addition to crowd-generated labels, LPHEADA provides details about the three critical components of any PHS system: place, time, and demographics (gender, age range) associated with each tweet. CONCLUSIONS Publicly available datasets for digital PASS surveillance are usually isolated and only provide labels for small subsets of the data. We believe that the novelty and comprehensiveness of the dataset provided in this study will help develop, evaluate, and deploy digital PASS surveillance systems. LPHEADA will be an invaluable resource for both public health researchers and practitioners.


Sign in / Sign up

Export Citation Format

Share Document