Missing data matters in participatory syndromic surveillance systems: comparative evaluation of missing data methods when estimating disease burden

Introduction Traditional surveillance methods have been enhanced by the emergence of online participatory syndromic surveillance systems that collect health-related digital data. These systems have many applications including tracking weekly prevalence of Influenza-Like Illness (ILI), predicting probable infection of Coronavirus 2019 (COVID-19), and determining risk factors of ILI and COVID-19. However, not every volunteer consistently completes surveys. In this study, we assess how different missing data methods affect estimates of ILI burden using data from FluTracking, a participatory surveillance system in Australia. Methods We estimate the incidence rate, the incidence proportion, and weekly prevalence using five missing data methods: available case, complete case, assume missing is non-ILI, multiple imputation (MI), and delta (δ) MI, which is a flexible and transparent method to impute missing data under Missing Not at Random (MNAR) assumptions. We evaluate these methods using simulated and FluTracking data. Results Our simulations show that the optimal missing data method depends on the measure of ILI burden and the underlying missingness model. Of note, the δ-MI method provides estimates of ILI burden that are similar to the true parameter under MNAR models. When we apply these methods to FluTracking, we find that the δ-MI method accurately predicted complete, end of season weekly prevalence estimates from real-time data. Conclusion Missing data is an important problem in participatory surveillance systems. Here, we show that accounting for missingness using statistical approaches leads to different inferences from the data.

Download Full-text

High added value of a population-based participatory surveillance system for community acute gastrointestinal, respiratory and influenza-like illnesses in Sweden, 2013–2014 using the web

Epidemiology and Infection ◽

10.1017/s0950268816003290 ◽

2017 ◽

Vol 145 (6) ◽

pp. 1193-1202 ◽

Cited By ~ 9

Author(s):

A. PINI ◽

H. MERK ◽

A. CARNAHAN ◽

I. GALANIS ◽

E. VAN STRATEN ◽

...

Keyword(s):

Surveillance System ◽

Syndromic Surveillance ◽

Correlation Coefficients ◽

Health Agency ◽

Population Based ◽

Medical Advice ◽

Added Value ◽

Surveillance Systems ◽

Participatory Surveillance ◽

The Web

SUMMARYIn 2013–2014, the Public Health Agency of Sweden developed a web-based participatory surveillance system, Hӓlsorapport, based on a random sample of individuals reporting symptoms weekly online, to estimate the community incidence of self-reported acute gastrointestinal (AGI), acute respiratory (ARI) and influenza-like (ILI) illnesses and their severity. We evaluated Hӓlsorapport's acceptability, completeness, representativeness and its data correlation with other surveillance data. We calculated response proportions and Spearman correlation coefficients (r) between (i) incidence of illnesses in Hӓlsorapport and (ii) proportions of specific search terms to medical-advice website and reasons for calling a medical advice hotline. Of 34 748 invitees, 3245 (9·3%) joined the cohort. Participants answered 81% (139 013) of the weekly questionnaires and 90% (16 351) of follow-up questionnaires. AGI incidence correlated with searches on winter-vomiting disease [r = 0·81, 95% confidence interval (CI) 0·69–0·89], and ARI incidence correlated with searches on cough (r = 0·77, 95% CI 0·62–0·86). ILI incidence correlated with the web query-based estimated incidence of ILI patients consulting physicians (r = 0·63, 95% CI 0·42–0·77). The high response to different questionnaires and the correlation with other syndromic surveillance systems suggest that Hӓlsorapport offers a reasonable representation of AGI, ARI and ILI patterns in the community and can complement traditional and syndromic surveillance systems to estimate their burden in the community.

Download Full-text

Health Information Privacy and Syndromic Surveillance Systems

PsycEXTRA Dataset ◽

10.1037/e307182005-038 ◽

2004 ◽

Cited By ~ 1

Author(s):

Daniel Drociuk ◽

J. Gibson ◽

J. Hodge

Keyword(s):

Health Information ◽

Syndromic Surveillance ◽

Information Privacy ◽

Surveillance Systems ◽

Health Information Privacy

Download Full-text

ESSENCE II and the Framework for Evaluating Syndromic Surveillance Systems

PsycEXTRA Dataset ◽

10.1037/e307182005-028 ◽

2004 ◽

Cited By ~ 4

Author(s):

Joseph S. Lombardo ◽

H. Burkom ◽

J. Pavlin

Keyword(s):

Syndromic Surveillance ◽

Surveillance Systems

Download Full-text

From Implementation to Automation A Step-by-Step Approach to Developing Syndromic Surveillance Systems from a Public Health Perspective

PsycEXTRA Dataset ◽

10.1037/e307182005-058 ◽

2004 ◽

Author(s):

Brian M. Lawson ◽

E. Fitzhugh ◽

S. Hall ◽

L. Hutwagner ◽

G. Seeman

Keyword(s):

Public Health ◽

Syndromic Surveillance ◽

Surveillance Systems ◽

Public Health Perspective

Download Full-text

Using novel methodologies to support burden of disease estimates

European Journal of Public Health ◽

10.1093/eurpub/ckaa165.951 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

T Hald

Keyword(s):

International Development ◽

Traditional Approach ◽

Source Population ◽

Surveillance Systems ◽

Data Generation ◽

Time Data ◽

African Countries ◽

Indirect Measure ◽

True Incidence ◽

Laboratory Capacity

Abstract A challenge to estimating burden of diarrheal diseases, particularly in LMICs, where laboratory capacity and surveillance systems are limited, is obtaining valid estimates of etiology proportions of cases. A commonly used method is systematic review of studies reporting pathogen isolation in diarrhea cases. However, studies often differ in design, source population, timeframe, and pathogens included, hampering extrapolation to the target population. In a study co-funded by the Bill and Melinda Gates Foundation and the UK Department for International Development, we explore a novel approach for estimating diarrhea etiology proportions in urban and rural populations in four African countries. We analyse sewage samples using short-read next-generation sequencing (NGS) to determine abundance of genes that can be mapped to specific bacterial genera, providing an estimate of the relative abundance of specific pathogens in each sample. In parallel to collecting sewage samples, a questionnaire-based population survey will estimate diarrheal incidence. By combining results, pathogen-specific incidence will be estimated and compared with incidence estimates from the traditional approach. The application NGS to human sewage has great potential for surveillance of foodborne infections, particularly in resource-poor settings where laboratory capacity for bacterial isolation is limited. First, NGS is a one method takes all approach, as it is based on detection of RNA/DNA, a language common across pathogens. Second, it is culture independent, allowing for real-time data generation and standardized sharing. Finally, few samples are needed to survey large populations for several pathogens at the same time. Thus, surveillance based on NGS of sewage may prove to be an indirect measure of incidence. Although it will not provide an estimate for the true incidence in the population, it will increase our understanding of the burden and as such be a proxy and novel way of ranking diseases.

Download Full-text

Smarter Open Government Data for Society 5.0: Are Your Open Data Smart Enough?

Sensors ◽

10.3390/s21155204 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5204

Author(s):

Anastasija Nikiforova

Keyword(s):

Industry 4.0 ◽

Economic Value ◽

Open Data ◽

Digital Data ◽

Open Government ◽

Data Sets ◽

Time Data ◽

Open Government Data ◽

Information And Communication ◽

Government Data

Nowadays, governments launch open government data (OGD) portals that provide data that can be accessed and used by everyone for their own needs. Although the potential economic value of open (government) data is assessed in millions and billions, not all open data are reused. Moreover, the open (government) data initiative as well as users’ intent for open (government) data are changing continuously and today, in line with IoT and smart city trends, real-time data and sensor-generated data have higher interest for users. These “smarter” open (government) data are also considered to be one of the crucial drivers for the sustainable economy, and might have an impact on information and communication technology (ICT) innovation and become a creativity bridge in developing a new ecosystem in Industry 4.0 and Society 5.0. The paper inspects OGD portals of 60 countries in order to understand the correspondence of their content to the Society 5.0 expectations. The paper provides a report on how much countries provide these data, focusing on some open (government) data success facilitating factors for both the portal in general and data sets of interest in particular. The presence of “smarter” data, their level of accessibility, availability, currency and timeliness, as well as support for users, are analyzed. The list of most competitive countries by data category are provided. This makes it possible to understand which OGD portals react to users’ needs, Industry 4.0 and Society 5.0 request the opening and updating of data for their further potential reuse, which is essential in the digital data-driven world.

Download Full-text

Using data mining to handle missing data in multi-hop sensor network applications

Proceedings of the Ninth ACM International Workshop on Data Engineering for Wireless and Mobile Access - MobiDE '10 ◽

10.1145/1850822.1850825 ◽

2010 ◽

Cited By ~ 7

Author(s):

Le Gruenwald ◽

Hanqing Yang ◽

Md. Shiblee Sadik ◽

Rahul Shukla

Keyword(s):

Data Mining ◽

Missing Data ◽

Sensor Network ◽

Network Applications ◽

Using Data

Download Full-text

Three-Dimensional Computerized Anthropometry of the Nose: Landmark Representation Compared to Surface Analysis

The Cleft Palate-Craniofacial Journal ◽

10.1597/06-021 ◽

2007 ◽

Vol 44 (3) ◽

pp. 278-285 ◽

Cited By ~ 14

Author(s):

Virgilio F. Ferrario ◽

Fabrizio Mian ◽

Redento Peretta ◽

Riccardo Rosati ◽

Chiarella Sforza

Keyword(s):

Three Dimensional ◽

Digital Data ◽

Limits Of Agreement ◽

B Splines ◽

Surface Areas ◽

Using Data ◽

Difference Volume ◽

T Values ◽

Plaster Models ◽

Plaster Casts

Objective: To compare three-dimensional nasal measurements directly made on subjects to those made on plaster casts, and nasal dimensions obtained with a surface-based approach to values obtained with a landmark representation. Methods: Soft-tissue nasal landmarks were directly digitized on 20 healthy adults. Stone casts of their noses were digitized and mathematically reconstructed using nonuniform rational B-splines (NURBS) curves. Linear distances, angles, volumes and surface areas were computed using facial landmarks and NURBS-reconstructed models (surface-based approach). Results: Measurements on the stone casts were somewhat smaller than values obtained directly from subjects (differences between −0.05 and −1.58 mm). Dahlberg's statistic ranged between 0.73 and 1.47 mm. Significant (p < .05) t values were found for 4 of 15 measurements. The surface-based approach gave values 3.5 (volumes) and 2.1 (surface area) times larger than those computed with the landmark-based method. The two values were significantly related (volume, r = 0.881; surface, r = 0.924; p < .001), the resulting equations estimated actual values well (mean difference, volume −0.01 mm3, SD 1.47, area 0.05 cm2, SD 1.44); limits of agreement between −2.89 and 2.87 mm3 (volume); −2.88 and 2.78 cm2 (area). Conclusions: Considering the characteristics of the two methods, and for practical purposes, nasal distances and angles obtained on plaster models were comparable to digital data obtained directly from subjects. Surface areas and volumes were best obtained using a surface-based approach, but could be estimated using data provided by the landmark representation.

Download Full-text

Does Interviewer Religious Dress Affect Survey Responses? Evidence from Morocco

Politics and Religion ◽

10.1017/s1755048314000455 ◽

2014 ◽

Vol 7 (4) ◽

pp. 734-760 ◽

Cited By ~ 22

Author(s):

Lindsay J. Benstead

Keyword(s):

Missing Data ◽

Social Dynamics ◽

Representative Survey ◽

Sensitive Questions ◽

Interviewer Effects ◽

Nationally Representative ◽

And Gender ◽

Using Data ◽

Religious Dress ◽

Survey Responses

AbstractFew studies examine religiosity-of-interviewer effects, despite recent expansion of surveying in the Muslim world. Using data from a nationally-representative survey of 800 Moroccans conducted in 2007, this study investigates whether and why interviewer religiosity and gender affect responses to religiously-sensitive questions. Interviewer dress affects responses to four of six items, but effects are larger and more consistent for religious respondents, in support of power relations theory. Religious Moroccans provide less pious responses to secular-appearing interviewers, whom they may link to the secular state, and more religious answers to interviewers wearing hijab, in order to safeguard their reputation in a society that values piety. Interviewer traits do not affect the probability of item-missing data. Religiosity-of-interviewer effects depend on interviewer gender for questions about dress choice, a gendered issue closely related to interviewer dress. Interviewer gender and dress should be coded and controlled for to reduce bias and better understand social dynamics.

Download Full-text

The Disciplinary Differences in the Characteristics and Effects of Non-Tenure-Track Faculty

Educational Evaluation and Policy Analysis ◽

10.3102/01623737211030467 ◽

2021 ◽

pp. 016237372110304

Author(s):

Di Xu ◽

Florence Xiaotao Ran

Keyword(s):

Tenure Track ◽

Subsequent Performance ◽

State College ◽

College System ◽

Tenure Track Faculty ◽

Engineering Mathematics ◽

Health Related ◽

Student Sorting ◽

Using Data

Using data with detailed instructor employment information from a state college system, this study examines disciplinary variations in the characteristics and effects of non-tenure-track faculty hired through temporary and long-term employment. We identify substantial differences in demographic and employment characteristics between the two types of non-tenure-line faculty, where the differences are most pronounced in science, technology, engineering, mathematics, and health-related fields (STEM) at 4-year colleges. Using an instrumental variables strategy to address student sorting, our analyses indicate that taking introductory courses with temporary adjuncts reduces subsequent interest, and the effects are particularly large in STEM fields at 4-year colleges. Long-term non-tenure faculty are generally comparable with tenure-track faculty in student subsequent interest, but tenure-track faculty are associated with better subsequent performance in a handful of fields.

Download Full-text