scholarly journals Soda Pop: A Time-Series Clustering, Alarming and Disease Forecasting Application

Author(s):  
Jeremiah Rounds ◽  
Lauren Charles-Smith ◽  
Courtney D. Corley

ObjectiveTo introduce Soda Pop, an R/Shiny application designed to be adisease agnostic time-series clustering, alarming, and forecastingtool to assist in disease surveillance “triage, analysis and reporting”workflows within the Biosurveillance Ecosystem (BSVE) [1]. In thisposter, we highlight the new capabilities that are brought to the BSVEby Soda Pop with an emphasis on the impact of metholodogicaldecisions.IntroductionThe Biosurveillance Ecosystem (BSVE) is a biological andchemical threat surveillance system sponsored by the Defense ThreatReduction Agency (DTRA). BSVE is intended to be user-friendly,multi-agency, cooperative, modular and threat agnostic platformfor biosurveillance [2]. In BSVE, a web-based workbench presentsthe analyst with applications (apps) developed by various DTRAfundedresearchers, which are deployed on-demand in the cloud(e.g., Amazon Web Services). These apps aim to address emergingneeds and refine capabilities to enable early warning of chemical andbiological threats for multiple users across local, state, and federalagencies.Soda Pop is an app developed by Pacific Northwest NationalLaboratory (PNNL) to meet the current needs of the BSVE forearly warning and detection of disease outbreaks. Aimed for use bya diverse set of analysts, the application is agnostic to data sourceand spatial scale enabling it to be generalizable across many diseasesand locations. To achieve this, we placed a particular emphasis onclustering and alerting of disease signals within Soda Pop withoutstrong prior assumptions on the nature of observed diseased counts.MethodsAlthough designed to be agnostic to the data source, Soda Pop wasinitially developed and tested on data summarizing Influenza-LikeIllness in military hospitals from collaboration with the Armed ForcesHealth Surveillance Branch. Currently, the data incorporated alsoincludes the CDC’s National Notifiable Diseases Surveillance System(NNDSS) tables [3] and the WHO’s Influenza A/B Influenza Data(Flunet) [4]. These data sources are now present in BSVE’s Postgresdata storage for direct access.Soda Pop is designed to automate time-series tasks of datasummarization, exploration, clustering, alarming and forecasting.Built as an R/Shiny application, Soda Pop is founded on the powerfulstatistical tool R [5]. Where applicable, Soda Pop facilitates nonparametricseasonal decomposition of time-series; hierarchicalagglomerative clustering across reporting areas and between diseaseswithin reporting areas; and a variety of alarming techniques includingExponential Weighted Moving Average alarms and Early AberrationDetection [6].Soda Pop embeds these techniques within a user-interface designedto enhance an analyst’s understanding of emerging trends in their dataand enables the inclusion of its graphical elements into their dossierfor further tracking and reporting. The ultimate goal of this softwareis to facilitate the discovery of unknown disease signals along withincreasing the speed of detection of unusual patterns within thesesignals.ConclusionsSoda Pop organizes common statistical disease surveillance tasksin a manner integrated with BSVE data source inputs and outputs.The app analyzes time-series disease data and supports a robust set ofclustering and alarming routines that avoid strong assumptions on thenature of observed disease counts. This attribute allows for flexibilityin the data source, spatial scale, and disease types making it useful toa wide range of analystsSoda Pop within the BSVE.KeywordsBSVE; Biosurveillance; R/Shiny; Clustering; AlarmingAcknowledgmentsThis work was supported by the Defense Threat Reduction Agency undercontract CB10082 with Pacific Northwest National LaboratoryReferences1. Dasey, Timothy, et al. “Biosurveillance Ecosystem (BSVE) WorkflowAnalysis.” Online journal of public health informatics 5.1 (2013).2. http://www.defense.gov/News/Article/Article/681832/dtra-scientistsdevelop-cloud-based-biosurveillance-ecosystem. Accessed 9/6/2016.3. Centers for Disease Control and Prevention. “National NotifiableDiseases Surveillance System (NNDSS).”4. World Health Organization. “FluNet.” Global Influenza Surveillanceand Response System (GISRS).5. R Core Team (2016). R: A language and environment for statisticalcomputing. R Foundation for Statistical Computing, Vienna, Austria.6. Salmon, Maëlle, et al. “Monitoring Count Time Series in R: AberrationDetection in Public Health Surveillance.” Journal of StatisticalSoftware [Online], 70.10 (2016): 1 - 35.

Author(s):  
Moise C. Ngwa ◽  
Song Liang ◽  
Leonard Mbam ◽  
Mouhaman Arabi ◽  
Andrew Teboh ◽  
...  

Public health surveillance is essential for early detection and rapid response to cholera outbreaks. In 2003, Cameroon adopted the integrated disease surveillance and response (IDSR) strategy. We describe cholera surveillance within IDSR-strategy in Cameroon. Data is captured at health facility, forwarded to health district that compiles and directs data to RDPH in paper format. RDPH sends the data to the national level via internet and from there to the WHO. The surveillance system is passive with no data analysis at districts. Thus the goal of IDSR-strategy of data analysis and rapid response at the district has not been met yet.


2019 ◽  
Vol 11 (1) ◽  
Author(s):  
Ta-Chien Chan ◽  
Yung-Chu Teng ◽  
Yen-Hua Chu ◽  
Tzu-Yu Lin

ObjectiveSentinel physician surveillance in the communities has played an important role in detecting early aberrations in epidemics. The traditional approach is to ask primary care physicians to actively report some diseases such as influenza-like illness (ILI), and hand, foot, and mouth disease (HFMD) to health authorities on a weekly basis. However, this is labor-intensive and time-consuming work. In this study, we try to set up an automatic sentinel surveillance system to detect 23 syndromic groups in the communites.IntroductionIn December 2009, Taiwan’s CDC stopped its sentinel physician surveillance system. Currently, infectious disease surveillance systems in Taiwan rely on not only the national notifiable disease surveillance system but also real-time outbreak and disease surveillance (RODS) from emergency rooms, and the outpatient and hospitalization surveillance system from National Health Insurance data. However, the timeliness of data exchange and the number of monitored syndromic groups are limited. The spatial resolution of monitoring units is also too coarse, at the city level. Those systems can capture the epidemic situation at the nationwide level, but have difficulty reflecting the real epidemic situation in communities in a timely manner. Based on past epidemic experience, daily and small area surveillance can detect early aberrations. In addition, emerging infectious diseases do not have typical symptoms at the early stage of an epidemic. Traditional disease-based reporting systems cannot capture this kind of signal. Therefore, we have set up a clinic-based surveillance system to monitor 23 kinds of syndromic groups. Through longitudinal surveillance and sensitive statistical models, the system can automatically remind medical practitioners of the epidemic situation of different syndromic groups, and will help them remain vigilant to susceptible patients. Local health departments can take action based on aberrations to prevent an epidemic from getting worse and to reduce the severity of the infected cases.MethodsWe collected data on 23 syndromic groups from participating clinics in Taipei City (in northern Taiwan) and Kaohsiung City (in southern Taiwan). The definitions of 21 of those syndromic groups with ICD-10 diagnoses were adopted from the International Society for Disease Surveillance (https://www.surveillancerepository.org/icd-10-cm-master-mapping-reference-table). The definitions of the other two syndromic groups, including dengue-like illness and enterovirus-like illness, were suggested by infectious disease and emergency medicine specialists.An enhanced sentinel surveillance system named “Sentinel plus” was designed for sentinel clinics and community hospitals. The system was designed with an interactive interface and statistical models for aberration detection. The data will be computed for different combinations of syndromic groups, age groups and gender groups. Every day, each participating clinic will automatically upload the data to the provider of the health information system (HIS) and then the data will be transferred to the research team.This study was approved by the committee of the Institutional Review Board (IRB) at Academia Sinica (AS-IRB02-106262, and AS-IRB02-107139). The databases we used were all stripped of identifying information and thus informed consent of participants was not required.ResultsThis system started to recruit the clinics in May 2018. As of August 2018, there are 89 clinics in Kaohsiung City and 33 clinics and seven community hospitals in Taipei City participating in Sentinel plus. The recruiting process is still ongoing. On average, the monitored volumes of outpatient visits in Kaohsiung City and Taipei City are 5,000 and 14,000 per day.Each clinic is provided one list informing them of the relative importance of syndromic groups, the age distribution of each syndromic group and a time-series chart of outpatient rates at their own clinic. In addition, they can also view the village-level risk map, with different alert colors. In this way, medical practitioners can know what’s going on, not only in their own clinics and communities but also in the surrounding communities.The Department of Health (Figure 1) can know the current increasing and decreasing trends of 23 syndromic groups by red and blue color, respectively. The spatial resolution has four levels including city, township, village and clinic. The map and bar chart represent the difference in outpatient rate between yesterday and the average for the past week. The line chart represents the daily outpatient rates for one selected syndromic group in the past seven days. The age distribution of each syndromic group and age-specific outpatient rates in different syndromic groups can be examined.ConclusionsSentinel plus is still at the early stage of development. The timeliness and the accuracy of the system will be evaluated by comparing with some syndromic groups in emergency rooms and the national notifiable disease surveillance system. The system is designed to assist with surveillance of not only infectious diseases but also some chronic diseases such as asthma. Integrating with external environmental data, Sentinel plus can alert public health workers to implement better intervention for the right population.References1. James W. Buehler AS, Marc Paladini, Paula Soper, Farzad Mostashari: Syndromic Surveillance Practice in the United States: Findings from a Survey of State, Territorial, and Selected Local Health Departments. Advances in Disease Surveillance 2008, 6(3).2. Ding Y, Fei Y, Xu B, Yang J, Yan W, Diwan VK, Sauerborn R, Dong H: Measuring costs of data collection at village clinics by village doctors for a syndromic surveillance system — a cross sectional survey from China. BMC Health Services Research 2015, 15:287.3. Kao JH, Chen CD, Tiger Li ZR, Chan TC, Tung TH, Chu YH, Cheng HY, Liu JW, Shih FY, Shu PY et al.: The Critical Role of Early Dengue Surveillance and Limitations of Clinical Reporting -- Implications for Non-Endemic Countries. PloS one 2016, 11(8):e0160230.4. Chan TC, Hu TH, Hwang JS: Daily forecast of dengue fever incidents for urban villages in a city. International Journal of Health Geographics 2015, 14:9.5. Chan TC, Teng YC, Hwang JS: Detection of influenza-like illness aberrations by directly monitoring Pearson residuals of fitted negative binomial regression models. BMC Public Health 2015, 15:168.6. Ma HT: Syndromic surveillance system for detecting enterovirus outbreaks evaluation and applications in public health. Taipei, Taiwan: National Taiwan University; 2007. 


2016 ◽  
Vol 8 (1) ◽  
Author(s):  
Nicholas Generous ◽  
Geoffrey Fairchild ◽  
Alina Deshpande ◽  
Sara Y. Del Valle ◽  
Reid Priedhorsky

This poster establishes the utility of Wikipedia as a broadly effective data source for disease information, and we outline a path to a reliable, scientifically sound, operational, and global disease surveillance system that overcomes key gaps in existing traditional and internet-based techniques.


2021 ◽  
pp. 287-296
Author(s):  
Peter Katona

History shows us that individuals have used and likely will continue to use biological agents for terrorism purposes. Bioterrorism agents can be easily disseminated, cause severe disease and high mortality rates if cases are not treated properly, and pose significant challenges for management and response. A robust public health surveillance system that includes laboratory (including routine reportable disease surveillance), syndromic, and environmental surveillance is crucial for detection of the release of a bioterrorism agent and the resulting cases. This detection can then set into motion a robust and comprehensive public health response to minimize morbidity and mortality. A large-scale bioterrorism event would be unprecedented, straining and challenging every facet of medical and public health response and would quickly become a global emergency because of both the potential risk of infection and the shock to the global economy. A robust public health and medical workforce is necessary to respond effectively and efficiently to these types of events.


2003 ◽  
Vol 6 (4) ◽  
pp. 371-376 ◽  
Author(s):  
A Rütten ◽  
H Ziemainz ◽  
F Schena ◽  
T Stahl ◽  
M Stiggelbout ◽  
...  

AbstractObjectives:The European Physical Activity Surveillance System (EUPASS) research project compared several physical activity (PA) measures (including the International Physical Activity Questionnaire (IPAQ)) in a time series survey in eight countries of the European Union. The present paper describes first results provided by the different instruments regarding PA participation, frequency and duration, both at the European and national levels. The purpose of the present study is to explore and compare the specific quality and usefulness of different indicators rather than to provide valid and reliable prevalence data. Thus, the main focus is on discussion of the methodological implications of the results presented.Methods:A time series survey based on computer-aided telephone interviewing (CATI) was carried out in eight European countries over a six-month period. The study provided for about 100 realised interviews per month in each country (i.e. ~600 per country). Descriptive statistical analysis was used to: (1) report IPAQ results on vigorous, moderate and light PA and sitting, as well as on the overall measure of calories expenditure (MET min−1), in the different countries; (2) compare these results with national PA indicators tested in EUPASS; and (3) compare IPAQ results with other European studies.Results:First, the scores for the different PA categories as well as for the overall measure of calories expenditure provided by the IPAQ appeared rather high compared with previous studies and public health recommendations. Second, the different PA measurements used in EUPASS provided completely different results. For example, national indicators used in Germany and The Netherlands to date neither corresponded in absolute values (e.g. means of PA or sitting) nor correlated with the IPAQ in any significant way. Third, comparing EU countries, the ranking for vigorous, moderate and light activities by use of the IPAQ differed from that of other European studies. For example, in the present analysis, German respondents generally showed higher scores for PA than the Finns and the Dutch, while, in contrast, findings from other studies ranked Finland before The Netherlands and Germany.Conclusions:The present analysis highlights some methodological implications of the IPAQ instrument. Among other things, differences in overall scores for PA as well as in the ranking of nations between the present results using IPAQ and other measures and studies may partly be due to the concepts of PA behind the measurements. Further analysis should investigate if the range of PA-related categories provided by the IPAQ is fully appropriate to measure all relevant daily activities; it may also consider the public health implications of mixing up different contexts of PA (e.g. work, leisure-time, transportation) in the IPAQ short version.


2020 ◽  
Author(s):  
Mehnaz Adnan ◽  
Xiaoying Gao ◽  
Xiaohan Bai ◽  
Elizabeth Newbern ◽  
Jill Sherwood ◽  
...  

BACKGROUND Over one-third of the population of Havelock North, New Zealand, approximately 5500 people, were estimated to have been affected by campylobacteriosis in a large waterborne outbreak. Cases reported through the notifiable disease surveillance system (notified case reports) are inevitably delayed by several days, resulting in slowed outbreak recognition and delayed control measures. Early outbreak detection and magnitude prediction are critical to outbreak control. It is therefore important to consider alternative surveillance data sources and evaluate their potential for recognizing outbreaks at the earliest possible time. OBJECTIVE The first objective of this study is to compare and validate the selection of alternative data sources (general practice consultations, consumer helpline, Google Trends, Twitter microblogs, and school absenteeism) for their temporal predictive strength for Campylobacter cases during the Havelock North outbreak. The second objective is to examine spatiotemporal clustering of data from alternative sources to assess the size and geographic extent of the outbreak and to support efforts to attribute its source. METHODS We combined measures derived from alternative data sources during the 2016 Havelock North campylobacteriosis outbreak with notified case report counts to predict suspected daily Campylobacter case counts up to 5 days before cases reported in the disease surveillance system. Spatiotemporal clustering of the data was analyzed using Local Moran’s I statistics to investigate the extent of the outbreak in both space and time within the affected area. RESULTS Models that combined consumer helpline data with autoregressive notified case counts had the best out-of-sample predictive accuracy for 1 and 2 days ahead of notified case reports. Models using Google Trends and Twitter typically performed the best 3 and 4 days before case notifications. Spatiotemporal clusters showed spikes in school absenteeism and consumer helpline inquiries that preceded the notified cases in the city primarily affected by the outbreak. CONCLUSIONS Alternative data sources can provide earlier indications of a large gastroenteritis outbreak compared with conventional case notifications. Spatiotemporal analysis can assist in refining the geographical focus of an outbreak and can potentially support public health source attribution efforts. Further work is required to assess the location of such surveillance data sources and methods in routine public health practice.


2017 ◽  
Vol 9 (1) ◽  
Author(s):  
Donald E. Brannen ◽  
Melissa Branum ◽  
Amy Schmitt

ObjectiveImprove disease reporting and outbreak mangement.IntroductionSpecific communicable diseases have to be reported by law withina specific time period. In Ohio, prior to 2001, most of these diseasereports were on paper reports that were reported from providers tolocal health departments. In turn the Communicable Disease Nursemailed the hardcopies to the Ohio Department of Health (ODH).In 2001 the Ohio Disease Reporting System (ODRS) was rolled out toall local public health agencies in Ohio.1ODRS is Ohio’s portion ofthe National Electronic Disease Surveillance System. ODRS shouldnot be confused with syndromic surveillance systems that are fordetecting a disease outbreak before the disease itself is detected.2Chronic disease surveillance system data has been evaluated forlong term trends and potential enhancements.3However, the use ofcommunicable disease reports vary greatly.4 However, the exportdata has not routinely been used for quality improvement purposesof the disease reporting process itself. In December 2014, GreeneCounty Public Health (GCPH) begain a project to improve reportingof communicable diseases and the response to disease outbreaks.MethodsInitial efforts were to understand the current disease reportingprocess: Quantitative management techniques including creating alogic model and process map of the existing process, brainstormingand ranking of issues. The diseases selected to study included:Campylobacteriosis, Cryptosporidiosis, E. coli O157:H7 &shiga toxin-producing E. coli, Giardiasis, Influenza-associatedhospitalization, Legionnaires’ disease, Pertussis, Salmonellosis,and Shigellosis. The next steps included creating a data collectionand analysis plan. An updated process map was created and thepre- and post-process maps were compared to identify areas toimprove. The median number of days were compared before andafter improvements were implemented. Modeling of the impact ofthe process improvements on the median number of days reportedwas conducted. Estimation of the impact in healthy number of daysderived from the reduction in days to report (if any) were calculated.ResultsProcess improvements identified: Ensure all disease reportersuse digital reporting methods preferably starting with electroniclaboratory reporting directly to the online disease reporting system,with other methods such as direct web data entry into system, faxinglab reports, orsecure emailing reports, with no or little hard copy mailing;Centralize incoming email and fax reports (eliminating process steps);Standardize backup staffing procedures for disease reporting staff;Formalize incident command procedures under the authorized personin charge for every incident rather than distribute command betweenenvironmental and clinical services; and place communicable diseasereporting under that single authority rather than clinical services. Thedays to report diseases were reduced from a median of 2 to .5 days(p<.001). All the diseases were improved except for crytosporodiumdue to an outlier report two months late. The estimated societalhealthy days saved were valued at $52,779 in the first eight monthsafter implementation of the improvements.ConclusionsImprovements in disease reporting decreased the reporting timefrom over 2 days to less than 1 day on average. Estimated societalhealthy days saved by this project during the first 9 months was$52,779. Management of early command and control for outbreakresponse was improved.


Author(s):  
Anne Fouillet ◽  
Marc Ruello ◽  
Lucie Leon ◽  
Cecile Sommen ◽  
Laurent Marie ◽  
...  

ObjectiveThe presentation describes the design and the main functionalitiesof two user-friendly applications developed using R-shiny to supportthe statistical analysis of morbidity and mortality data from the Frenchsyndromic surveillance system SurSaUD.IntroductionThe French syndromic surveillance system SursaUD® has beenset up by Santé publique France, the national public health agency(formerly French institute for public health - InVS) in 2004. In 2016,the system is based on three main data sources: the attendancesin about 650 emergency departments (ED), the consultations to62 emergency general practitioners’ (GPs) associations SOSMédecins and the mortality data from 3,000 civil status offices [1].Daily, about 60,000 attendances in ED (88% of the nationalattendances), 8,000 visits in SOS Médecins associations (95% ofthe national visits) and 1,200 deaths (80% of the national mortality)are recorded all over the territory and transmitted to Santé publiqueFrance.About 100 syndromic groupings of interest are constructed fromthe reported diagnostic codes, and monitored daily or weekly, fordifferent age groups and geographical scales, to characterize trends,detect expected or unexpected events (outbreaks) and assess potentialimpact of both environmental and infectious events. All-causesmortality is also monitored in similar objectives.Two user-friendly interactive web applications have beendeveloped using the R shiny package [2] to provide a homogeneousframework for all the epidemiologists involved in the syndromicsurveillance at the national and the regional levels.MethodsThe first application, named MASS-SurSaUD, is dedicated to theanalysis of the two morbidity data sources in Sursaud, along with dataprovided by a network of Sentinel GPs [3]. Based on pre-aggregateddata availaible daily at 10:30 am, R programs create daily, weeklyand monthly time series of the proportion of each syndromic groupingamong all visits/attendances with a valid code at the national andregional levels. Twelve syndromic groupings (mainly infectious andrespiratory groups, like ILI, gastroenteritis, bronchiolitis, pulmonarydiseases) and 13 age groups have been chosen for this application.For ILI, 3 statistical methods (periodic regression, robust periodicregression and Hidden Markov model) have been implementedto identify outbreaks. The results of the 3 methods applied to the3 data sources are combined with a voting algorithm to compilethe influenza alarm level for each region each week: non-epidemic,pre/post epidemic or epidemic.The second application, named MASS-Euromomo, allowsconsulting results provided by the model developed by the Europeanproject EuroMomo for the common analysis of mortality in theEuropean countries (www.euromomo.eu). The Euromomo model,initially developed using Stata software, has been transcripted inR. The model has been adapted to run in France both at a national,regional and other geographical administrative levels, and for 7 agegroups.ResultsThe two applications, accessible on a web-portal, are similarlydesigned, with:- a dropdown menu and radio buttons on the left hand side to selectthe data to display (e.g. filter by data source, age group, geographicallevels, syndromic grouping and/or time period),- several tab panels allowing to consult data and statistical resultsthrough tables, static and dynamic charts, statistical alarm matrix,geographical maps,... (Figure 1),- a “help” tab panel, including documentations and guidelines,links, contact details.The MASS-SurSaUD application has been deployed in December2015 and used during the 2015-2016 influenza season. MASS-Euromomo application has been deployed in July 2016 for the heat-wave surveillance period. Positive feedbacks from several users havebeen reported.ConclusionsBusiness Intelligence tools are generally focused on datavisualisation and are not generally tailored for providing advancedstatistical analysis. Web applications built with the R-shiny packagecombining user-friendly visualisations and advanced statistics can berapidly built to support timely epidemiological analyses and outbreakdetection.Figure 1: screen-shots of a page of the two applications


10.2196/18281 ◽  
2020 ◽  
Vol 6 (3) ◽  
pp. e18281
Author(s):  
Mehnaz Adnan ◽  
Xiaoying Gao ◽  
Xiaohan Bai ◽  
Elizabeth Newbern ◽  
Jill Sherwood ◽  
...  

Background Over one-third of the population of Havelock North, New Zealand, approximately 5500 people, were estimated to have been affected by campylobacteriosis in a large waterborne outbreak. Cases reported through the notifiable disease surveillance system (notified case reports) are inevitably delayed by several days, resulting in slowed outbreak recognition and delayed control measures. Early outbreak detection and magnitude prediction are critical to outbreak control. It is therefore important to consider alternative surveillance data sources and evaluate their potential for recognizing outbreaks at the earliest possible time. Objective The first objective of this study is to compare and validate the selection of alternative data sources (general practice consultations, consumer helpline, Google Trends, Twitter microblogs, and school absenteeism) for their temporal predictive strength for Campylobacter cases during the Havelock North outbreak. The second objective is to examine spatiotemporal clustering of data from alternative sources to assess the size and geographic extent of the outbreak and to support efforts to attribute its source. Methods We combined measures derived from alternative data sources during the 2016 Havelock North campylobacteriosis outbreak with notified case report counts to predict suspected daily Campylobacter case counts up to 5 days before cases reported in the disease surveillance system. Spatiotemporal clustering of the data was analyzed using Local Moran’s I statistics to investigate the extent of the outbreak in both space and time within the affected area. Results Models that combined consumer helpline data with autoregressive notified case counts had the best out-of-sample predictive accuracy for 1 and 2 days ahead of notified case reports. Models using Google Trends and Twitter typically performed the best 3 and 4 days before case notifications. Spatiotemporal clusters showed spikes in school absenteeism and consumer helpline inquiries that preceded the notified cases in the city primarily affected by the outbreak. Conclusions Alternative data sources can provide earlier indications of a large gastroenteritis outbreak compared with conventional case notifications. Spatiotemporal analysis can assist in refining the geographical focus of an outbreak and can potentially support public health source attribution efforts. Further work is required to assess the location of such surveillance data sources and methods in routine public health practice.


Sign in / Sign up

Export Citation Format

Share Document