scholarly journals Physical activity, sedentary behaviour, and sleep on Twitter: A labelled dataset for public health research

Author(s):  
Zahra Shakeri Hossein Abad ◽  
Gregory P. Butler ◽  
Wendy Thompson ◽  
Joon Lee

ABSTRACTAdvances in automated data processing, together with the unprecedented growth in user-generated social media (SM) content, have made public health surveillance (PHS) one of the long-lasting SM applications. However, the existing PHS systems feeding on SM data have not been widely deployed in national surveillance systems, which appears to stem from the lack of practitioners’ trust in SM data. More robust datasets over which machine learning (ML) models can be trained/tested reliably is a significant step toward overcoming this hurdle. The health implications of physical activity, sedentary behaviour, and sleep (PASS) are widely studied through traditional data sources, which are often out-of-date, costly to collect, and thus limited in quantity and coverage. We present LPHEADA, a multicountry and fully Labelled digital Public HEAlth DAtaset of tweets originated in Australia/Canada/United Kingdom/United States between November 2018-June 2020. LPHEADA contains 366,405 labels for 122,135 PASS-related tweets and provides details about the place/time/demographics associated with each tweet. LPHEADA is publicly available and can be utilized to develop (un)supervised ML models for digital PASS surveillance.

2021 ◽  
Author(s):  
Zahra Shakeri Hossein Abad ◽  
Gregory P. Butler ◽  
Wendy Thompson ◽  
Joon Lee

BACKGROUND Advances in automated data processing and machine learning (ML) models, together with the unprecedented growth in the number of social media users who publicly share and discuss health-related information, have made public health surveillance (PHS) one of the long-lasting social media applications. However, the existing PHS systems feeding on social media data have not been widely deployed in national surveillance systems, which appears to stem from the lack of practitioners and the public’s trust in social media data. More robust and reliable datasets over which supervised machine learning models can be trained and tested reliably is a significant step toward overcoming this hurdle. OBJECTIVE The health implications of daily behaviours (physical activity, sedentary behaviour, and sleep (PASS)), as an evergreen topic in PHS, are widely studied through traditional data sources such as surveillance surveys and administrative databases, which are often several months out of date by the time they are utilized, costly to collect, and thus limited in quantity and coverage. In this paper, we present LPHEADA, a multicountry and fully Labelled digital Public HEAlth DAtaset of tweets originated in Australia, Canada, the United Kingdom (UK), or the United States (US). METHODS We collected the data of this study from Twitter using the Twitter livestream application programming interface (API) between 28th November 2018 to 19th June 2020. To obtain PASS-related tweets for manual annotation, we iteratively used regular expressions, unsupervised natural language processing, domain-specific ontologies and linguistic analysis. We used Amazon Mechanical Turk (AMT) to label the collected data to self-reported PASS categories and implemented a quality control pipeline to monitor and manage the validity of crow-generated labels. Moreover, we used ML, latent semantic analysis, linguistic analysis, and label inference analysis to validate different components of the dataset. RESULTS LPHEADA contains 366,405 crowd-generated labels (three labels per tweet) for 122,135 PASS-related tweets, labelled by 708 unique annotators on AMT. In addition to crowd-generated labels, LPHEADA provides details about the three critical components of any PHS system: place, time, and demographics (gender, age range) associated with each tweet. CONCLUSIONS Publicly available datasets for digital PASS surveillance are usually isolated and only provide labels for small subsets of the data. We believe that the novelty and comprehensiveness of the dataset provided in this study will help develop, evaluate, and deploy digital PASS surveillance systems. LPHEADA will be an invaluable resource for both public health researchers and practitioners.


2020 ◽  
Author(s):  
Falaho Sani ◽  
Mohammed Hasen ◽  
Mohammed Seid ◽  
Nuriya Umer

Abstract Background: Public health surveillance systems should be evaluated periodically to ensure that the problems of public health importance are being monitored efficiently and effectively. Despite the widespread measles outbreak in Ginnir district of Bale zone in 2019, evaluation of measles surveillance system has not been conducted. Therefore, we evaluated the performance of measles surveillance system and its key attributes in Ginnir district, Southeast Ethiopia.Methods: We conducted a concurrent embedded mixed quantitative/qualitative study in August 2019 among 15 health facilities/study units in Ginnir district. Health facilities are selected using lottery method. The qualitative study involved purposively selected 15 key informants. Data were collected using semi-structured questionnaire adapted from Centers for Disease Control and Prevention guidelines for evaluating public health surveillance systems through face-to-face interview and record review. The quantitative findings were analyzed using Microsoft Excel 2016 and summarized by frequency and proportion. The qualitative findings were narrated and summarized based on thematic areas to supplement the quantitative findings.Results: The structure of surveillance data flow was from the community to the respective upper level. Emergency preparedness and response plan was available only at the district level. Completeness of weekly report was 95%, while timeliness was 87%. No regular analysis and interpretations of surveillance data, and the supportive supervision and feedback system was weak. The participation and willingness of surveillance stakeholders in implementation of the system was good. The surveillance system was found to be useful, easy to implement, representative and can accommodate and adapt to changing conditions. Report documentation and quality of data was poor at lower level health facilities. Stability of the system has been challenged by shortage of budget and logistics, staff turnover and lack of update trainings.Conclusions: The surveillance system was acceptable, useful, simple, flexible and representative. Data quality, timeliness and stability of the system were attributes that require improvement. The overall performance of measles surveillance system in the district was poor. Hence, regular analysis of data, preparation and dissemination of epidemiological bulletin, capacity building and regular supervision and feedback are recommended to enhance performance of the system.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Maroje Sorić ◽  
Kaja Meh ◽  
Paulo Rocha ◽  
Wanda Wendel-Vos ◽  
Ellen de Hollander ◽  
...  

Abstract Background Physical inactivity has been recognised as a global public health problem that requires concerted action. This calls for systematic physical activity (PA) surveillance as a mechanism for assessing the problem and evaluating the effectiveness of related policies. Because countries tend to design their policy measures based on national surveillance data, here we present an inventory of existing national surveillance systems on PA, sedentary behaviour (SB) and sport participation (SP) among adult population in all European Union (EU) Member States. Methods As a part of the European Physical Activity and Sports Monitoring System (EUPASMOS) project, a questionnaire was constructed in the form of an on-line survey to collect detailed information on existing national surveillance systems on either PA, SB, or SP. National HEPA focal points from all 27 EU Member States were invited to answer the on-line questionnaire and data collection took part in the period May 2018–September 2019. Results National monitoring of PA or SB or SP for adults has been established in 16/27 EU Member States, that host 33 different PA/SB/SP monitoring systems. Apart from 3 countries that are using accelerometers (Finland, Ireland and Portugal), surveillance is typically based on questionnaires. In most Member States these questionnaires have not been validated in the particular language and cultural setting. Next, specific domains and dimensions of PA, SB and SP assessed vary a lot across countries. Only 3 countries (the Netherlands, Portugal and Slovenia) are monitoring all three behaviours while covering most of the domains and dimensions of PA/SB/SP. Lastly, as half of the existing surveillance systems set an upper age limit, in 9/16 countries that are monitoring PA/SB/SP, no data for people older than 80 years are available. Conclusions Systematic surveillance of PA is lacking among 11/27 EU countries, with even few monitoring SB and SP. Besides, existing surveillance systems typically fail to assess all dimensions and domains of PA/SB/SP with only three countries maintaining monitoring systems that encompass all three behaviours while covering most of the domains and dimensions of PA/SB/SP. Hence, additional efforts in advocacy of systematic PA surveillance in the EU are called for.


2014 ◽  
Vol 6 (1) ◽  
Author(s):  
Rhonda A. Lizewski ◽  
Howard Burkom ◽  
Joseph Lombardo ◽  
Christopher Cuellar ◽  
Yevgeniy Elbert ◽  
...  

While other surveillance systems may only use death and admissions as severity indicators, these serious events may overshadow the more subtle severity signals based on appointment type, disposition from an outpatient setting, and whether that patient had to return for care if they their condition has not improved.  This abstract discusses how these additional data fields were utilized in a fusion model to improve the Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE).


Sign in / Sign up

Export Citation Format

Share Document