Improving the Fusion of Outbreak Detection Methods with Supervised Learning

Acute infectious gastroenteritis (AGE) is among the leading causes of mortality in children less than 5 years of age worldwide. There are many causative agents that lead to this infection, with rotavirus being the commonest pathogen in the past decade. However, this trend is now being progressively replaced by another agent, which is the norovirus. Apart from the viruses, bacteria such as Salmonella and Escherichia coli and parasites such as Entamoeba histolytica also contribute to AGE. These agents can be recognised by their respective biological markers, which are mainly the specific antigens or genes to determine the causative pathogen. In conjunction to that, omics technologies are currently providing crucial insights into the diagnosis of acute infectious gastroenteritis at the molecular level. Recent advancement in omics technologies could be an important tool to further elucidate the potential causative agents for AGE. This review will explore the current available biomarkers and antigens available for the diagnosis and management of the different causative agents of AGE. Despite the high-priced multi-omics approaches, the idea for utilization of these technologies is to allow more robust discovery of novel antigens and biomarkers related to management AGE, which eventually can be developed using easier and cheaper detection methods for future clinical setting. Thus, prediction of prognosis, virulence and drug susceptibility for active infections can be obtained. Case management, risk prediction for hospital-acquired infections, outbreak detection, and antimicrobial accountability are aimed for further improvement by integrating these capabilities into a new clinical workflow.

Download Full-text

Comparison of Molecular Subtyping and Antimicrobial Resistance Detection Methods Used in a Large Multistate Outbreak of Extensively Drug-Resistant Campylobacter jejuni Infections Linked to Pet Store Puppies

Journal of Clinical Microbiology ◽

10.1128/jcm.00771-20 ◽

2020 ◽

Vol 58 (10) ◽

Author(s):

Lavin A. Joseph ◽

Louise K. Francois Watkins ◽

Jessica Chen ◽

Kaitlin A. Tagg ◽

Christy Bennett ◽

...

Keyword(s):

Antimicrobial Resistance ◽

Campylobacter Jejuni ◽

Multilocus Sequence Typing ◽

The United States ◽

Sequence Type ◽

Outbreak Detection ◽

Detection Methods ◽

Drug Resistant ◽

Molecular Subtyping ◽

Content Type

ABSTRACT Campylobacter jejuni is a leading cause of enteric bacterial illness in the United States. Traditional molecular subtyping methods, such as pulsed-field gel electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST), provided limited resolution to adequately identify C. jejuni outbreaks and separate out sporadic isolates during outbreak investigations. Whole-genome sequencing (WGS) has emerged as a powerful tool for C. jejuni outbreak detection. In this investigation, 45 human and 11 puppy isolates obtained during a 2016–2018 outbreak linked to pet store puppies were sequenced. Core genome multilocus sequence typing (cgMLST) and high-quality single nucleotide polymorphism (hqSNP) analysis of the sequence data separated the isolates into the same two clades containing minor within-clade differences; however, cgMLST analysis does not require selection of an appropriate reference genome, making the method preferable to hqSNP analysis for Campylobacter surveillance and cluster detection. The isolates were classified as sequence type 2109 (ST2109)—a rarely seen MLST sequence type. PFGE was performed on 38 human and 10 puppy isolates; PFGE patterns did not reliably predict clustering by cgMLST analysis. Genetic detection of antimicrobial resistance determinants predicted that all outbreak-associated isolates would be resistant to six drug classes. Traditional antimicrobial susceptibility testing (AST) confirmed a high correlation between genotypic and phenotypic antimicrobial resistance determinations. WGS analysis linked C. jejuni isolates in humans and pet store puppies even when canine exposure information was unknown, aiding the epidemiological investigation during the outbreak. WGS data were also used to quickly identify the highly drug-resistant profile of these outbreak-associated C. jejuni isolates.

Download Full-text

Approaches to the evaluation of outbreak detection methods

BMC Public Health ◽

10.1186/1471-2458-6-263 ◽

2006 ◽

Vol 6 (1) ◽

Cited By ~ 19

Author(s):

Rochelle E Watkins ◽

Serryn Eagleson ◽

Robert G Hall ◽

Lynne Dailey ◽

Aileen J Plant

Keyword(s):

Outbreak Detection ◽

Detection Methods

Download Full-text

Automated use of WHONET and SaTScan to detect outbreaks of Shigella spp. using antimicrobial resistance phenotypes

Epidemiology and Infection ◽

10.1017/s0950268809990884 ◽

2009 ◽

Vol 138 (6) ◽

pp. 873-883 ◽

Cited By ~ 31

Author(s):

J. STELLING ◽

W. K. YIH ◽

M. GALAS ◽

M. KULLDORFF ◽

M. PICHEL ◽

...

Keyword(s):

Antimicrobial Resistance ◽

Disease Surveillance ◽

Laboratory Data ◽

Disease Outbreak ◽

Outbreak Detection ◽

Detection Methods ◽

Scan Statistic ◽

National Network ◽

Disease Outbreak Detection ◽

Shigella Spp

SUMMARYAntimicrobial resistance is a priority emerging public health threat, and the ability to detect promptly outbreaks caused by resistant pathogens is critical for resistance containment and disease control efforts. We describe and evaluate the use of an electronic laboratory data system (WHONET) and a space–time permutation scan statistic for semi-automated disease outbreak detection. In collaboration with WHONET-Argentina, the national network for surveillance of antimicrobial resistance, we applied the system to the detection of local and regional outbreaks of Shigella spp. We searched for clusters on the basis of genus, species, and resistance phenotype and identified 19 statistical ‘events’ in a 12-month period. Of the six known outbreaks reported to the Ministry of Health, four had good or suggestive agreement with SaTScan-detected events. The most discriminating analyses were those involving resistance phenotypes. Electronic laboratory-based disease surveillance incorporating statistical cluster detection methods can enhance infectious disease outbreak detection and response.

Download Full-text

Learning Factorial Codes by Predictability Minimization

Neural Computation ◽

10.1162/neco.1992.4.6.863 ◽

1992 ◽

Vol 4 (6) ◽

pp. 863-879 ◽

Cited By ~ 85

Author(s):

Jürgen Schmidhuber

Keyword(s):

Unsupervised Learning ◽

Supervised Learning ◽

General Principle ◽

Novelty Detection ◽

The Other ◽

Detection Methods ◽

Local Algorithm ◽

The Novel ◽

Abstract Concepts ◽

Internal Representations

I propose a novel general principle for unsupervised learning of distributed nonredundant internal representations of input patterns. The principle is based on two opposing forces. For each representational unit there is an adaptive predictor, which tries to predict the unit from the remaining units. In turn, each unit tries to react to the environment such that it minimizes its predictability. This encourages each unit to filter "abstract concepts" out of the environmental input such that these concepts are statistically independent of those on which the other units focus. I discuss various simple yet potentially powerful implementations of the principle that aim at finding binary factorial codes (Barlow et al. 1989), i.e., codes where the probability of the occurrence of a particular input is simply the product of the probabilities of the corresponding code symbols. Such codes are potentially relevant for (1) segmentation tasks, (2) speeding up supervised learning, and (3) novelty detection. Methods for finding factorial codes automatically implement Occam's razor for finding codes using a minimal number of units. Unlike previous methods the novel principle has a potential for removing not only linear but also nonlinear output redundancy. Illustrative experiments show that algorithms based on the principle of predictability minimization are practically feasible. The final part of this paper describes an entirely local algorithm that has a potential for learning unique representations of extended input sequences.

Download Full-text

Supervised Learning for Automated Infectious-Disease-Outbreak Detection

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v11i1.9770 ◽

2019 ◽

Vol 11 (1) ◽

Cited By ~ 1

Author(s):

Stephane Ghozzi ◽

Benedikt Zacher ◽

Alexander Ullrich

Keyword(s):

Infectious Disease ◽

Machine Learning ◽

Time Series ◽

Supervised Learning ◽

Expert Knowledge ◽

Disease Outbreak ◽

Outbreak Detection ◽

Infectious Disease Outbreak ◽

Count Time Series ◽

Disease Outbreak Detection

ObjectiveBy systematically scoring algorithms and integrating outbreak data through statistical learning, evaluate and improve the performance of automated infectious-disease-outbreak detection. The improvements should be directly relevant to the epidemiological practice. A broader objective is to explore the usefulness of machine-learning approaches in epidemiology.IntroductionWithin the traditional surveillance of notifiable infectious diseases in Germany, not only are individual cases reported to the Robert Koch Institute, but also outbreaks themselves are recorded: A label is assigned by epidemiologists to each case, indicating whether it is part of an outbreak and of which. This expert knowledge represents, in the language of machine leaning, a "ground truth" for the algorithmic task of detecting outbreaks from a stream of surveillance data. The integration of this kind of information in the design and evaluation of algorithms is called supervised learning.MethodsReported cases were aggregated weekly and divided into two count time series, one for endemic (not part of an outbreak) and one for epidemic cases. Two new algorithms were developed for the analysis of such time series: farringtonOutbreak is an adaptation of the standard method farringtonFlexible as implemented in the surveillance R package: It trains on endemic case counts but detects anomalies on total case counts. The second algorithm is hmmOutbreak, which is based on a hidden Markov model (HMM): A binary hidden state indicates whether an outbreak was reported in a given week, the transition matrix for this state is learned from the outbreak data and this state is integrated as factor in a generalised linear model of the total case count. An explicit probability of being in a state of outbreak is then computed for each week (one-week ahead) and a signal is generated if it is higher than a user-defined threshold.To evaluate performance, we framed outbreak detection as a simple binary classification problem: Is there an outbreak in a given week, yes or no? Was a signal generated for this week, yes or no? One can thus count, for each time series, the true positives (outbreak data and signals agree), false positives, true negatives and false negatives. From those, classical performance scores can be computed, such as sensitivity, specificity, precision, F-score or area under the ROC curve (AUC).For the evaluation with real-word data we used time series of reported cases of salmonellosis and campylobacteriosis for each of the 412 German counties over 9 years. We also ran simple simulations with different parameter sets, generating count time series and outbreaks with the sim.pointSource function of the surveillance R package.ResultsWe have developed a supervised-learning framework for outbreak detection based on reported infections and outbreaks, proposing two algorithms and an evaluation method. hmmOutbreak performs overall much better than the standard farringtonFlexible, with e.g. a 60% improvement in sensitivity (0.5 compared to 0.3) at a fixed specificity of 0.9. The results were confirmed by simulations. Furthermore, the computation of explicit outbreak probabilities allows a better and clearer interpretation of detection results than the usual testing of the null hypothesis "is endemic".ConclusionsMethods of machine learning can be usefully applied in the context of infectious-disease surveillance. Already a simple HMM shows large improvements and better interpretability: More refined methods, in particular semi-supervised approaches, look thus very promising. The systematic integration of available expert knowledge, in this case the recording of outbreaks, allows an evaluation of algorithmic performance that is of direct relevance for the epidemiological practice, in contrast to the usual intrinsic statistical metrics. Beyond that, this knowledge can be readily used to improve that performance and, in the future, gain insights in outbreak dynamics. Moreover, other types of labels will be similarly integrated in automated surveillance analyses, e.g. user feedback on whether a signal was relevant (reinforcement learning) or messages on specialised internet platforms that were found to be useful warnings of international epidemic events.

Download Full-text

The importance of age-specific data in routine syndromic surveillance

Online Journal of Public Health Informatics ◽

10.5210/ojphi.v9i1.7602 ◽

2017 ◽

Vol 9 (1) ◽

Author(s):

Roger Morbey ◽

Alex J. Elliot ◽

Gillian E. Smith

Keyword(s):

Public Health ◽

Hospital Admissions ◽

Age Distribution ◽

Age Groups ◽

Health Indicators ◽

Older Age ◽

Outbreak Detection ◽

Detection Methods ◽

Surveillance Systems ◽

Wide Range

ObjectiveTo investigate whether aberration detection methods for syndromicsurveillance would be more useful if data were stratified by age band.IntroductionWhen monitoring public health incidents using syndromicsurveillance systems, Public Health England (PHE) uses the ageof the presenting patient as a key indicator to further assess theseverity, impact of the incident, and to provide intelligence on thelikely cause. However the age distribution of cases is usually notconsidered until after unusual activity has been identified in the all-ages population data. We assessed whether monitoring specific agegroups contemporaneously could improve the timeliness, specificityand sensitivity of public health surveillance.MethodsFirst, we examined a wide range of health indicators from the PHEsyndromic surveillance systems to identify for further study thosewith the greatest seasonal variation in the age distribution of cases.Secondly, we examined the identified indicators to ascertain whetherany age bands consistently lagged behind other age bands. Finally,we applied outbreak detection methods retrospectively to age specificdata, identifying periods of increased activity that were only detectedor detected earlier when age-specific surveillance was used.ResultsSeasonal increases in respiratory indicators occurred first inyounger age groups, with increases in children under 5 providingearly warning of subsequent increases occurring in older age groups.Also, we found age specific indicators improved the specificity ofsurveillance using indicators relating to respiratory and eye problems;identifying unusual activity that was less apparent in the all-agespopulation.ConclusionsRoutine surveillance of respiratory indicators in young childrenwould have provided early warning of increases in older age groups,where the burden on health care usage, e.g. hospital admissions, isgreatest. Furthermore this cross-correlation between ages occurredconsistently even though the age distribution of the burden ofrespiratory cases varied between seasons. Age specific surveillancecan improve sensitivity of outbreak detection although all-agesurveillance remains more powerful when case numbers are low.

Download Full-text

A multi-factor integration-based semi-supervised learning for address resolution protocol attack detection in SDIIoT

International Journal of Distributed Sensor Networks ◽

10.1177/15501477211059940 ◽

2021 ◽

Vol 17 (12) ◽

pp. 155014772110599

Author(s):

Zhong Li ◽

Huimin Zhuang

Keyword(s):

Internet Of Things ◽

Supervised Learning ◽

Software Defined Networking ◽

Training Data ◽

Detection Methods ◽

Industrial Internet Of Things ◽

Data Set ◽

Correct Judgment ◽

Industrial Internet ◽

Address Resolution Protocol

Nowadays, in the industrial Internet of things, address resolution protocol attacks are still rampant. Recently, the idea of applying the software-defined networking paradigm to industrial Internet of things is proposed by many scholars since this paradigm has the advantages of flexible deployment of intelligent algorithms and global coordination capabilities. These advantages prompt us to propose a multi-factor integration-based semi-supervised learning address resolution protocol detection method deployed in software-defined networking, called MIS, to specially solve the problems of limited labeled training data and incomplete features extraction in the traditional address resolution protocol detection methods. In MIS method, we design a multi-factor integration-based feature extraction method and propose a semi-supervised learning framework with differential priority sampling. MIS considers the address resolution protocol attack features from different aspects to help the model make correct judgment. Meanwhile, the differential priority sampling enables the base learner in self-training to learn efficiently from the unlabeled samples with differences. We conduct experiments based on a real data set collected from a deepwater port and a simulated data set. The experiments show that MIS can achieve good performance in detecting address resolution protocol attacks with F1-measure, accuracy, and area under the curve of 97.28%, 99.41%, and 98.36% on average. Meanwhile, compared with fully supervised learning and other popular address resolution protocol detection methods, MIS also shows the best performance.

Download Full-text

Discriminating Cognitive Disequilibrium and Flow in Problem Solving: A Semi-Supervised Approach Using Involuntary Dynamic Behavioral Signals

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5378 ◽

2020 ◽

Vol 34 (01) ◽

pp. 420-427

Author(s):

Mononito Goswami ◽

Lujie Chen ◽

Artur Dubrawski

Keyword(s):

Problem Solving ◽

Supervised Learning ◽

Detection Methods ◽

Social Environments ◽

Affective States ◽

Body Movements ◽

Complexity Measures ◽

Low Dimensional ◽

Short Time ◽

Cognitive Disequilibrium

Problem solving is one of the most important 21st century skills. However, effectively coaching young students in problem solving is challenging because teachers must continuously monitor their cognitive and affective states, and make real-time pedagogical interventions to maximize their learning outcomes. It is an even more challenging task in social environments with limited human coaching resources. To lessen the cognitive load on a teacher and enable affect-sensitive intelligent tutoring, many researchers have investigated automated cognitive and affective detection methods. However, most of the studies use culturally-sensitive indices of affect that are prone to social editing such as facial expressions, and only few studies have explored involuntary dynamic behavioral signals such as gross body movements. In addition, most current methods rely on expensive labelled data from trained annotators for supervised learning. In this paper, we explore a semi-supervised learning framework that can learn low-dimensional representations of involuntary dynamic behavioral signals (mainly gross-body movements) from a modest number of short time series segments. Experiments on a real-world dataset reveal a significant advantage of these representations in discriminating cognitive disequilibrium and flow, as compared to traditional complexity measures from dynamical systems literature, and demonstrate their potential in transferring learned models to previously unseen subjects.

Download Full-text

Cloud detection in all-sky images via multi-scale neighborhood features and multiple supervised learning techniques

Atmospheric Measurement Techniques ◽

10.5194/amt-10-199-2017 ◽

2017 ◽

Vol 10 (1) ◽

pp. 199-208 ◽

Cited By ~ 11

Author(s):

Hsu-Yung Cheng ◽

Chih-Lung Lin

Keyword(s):

Supervised Learning ◽

Detection Methods ◽

Support Vector ◽

Detection Accuracy ◽

Cloud Detection ◽

Multi Scale ◽

Cloud Models ◽

Learning Techniques ◽

Image Patches ◽

Local Image

Abstract. Cloud detection is important for providing necessary information such as cloud cover in many applications. Existing cloud detection methods include red-to-blue ratio thresholding and other classification-based techniques. In this paper, we propose to perform cloud detection using supervised learning techniques with multi-resolution features. One of the major contributions of this work is that the features are extracted from local image patches with different sizes to include local structure and multi-resolution information. The cloud models are learned through the training process. We consider classifiers including random forest, support vector machine, and Bayesian classifier. To take advantage of the clues provided by multiple classifiers and various levels of patch sizes, we employ a voting scheme to combine the results to further increase the detection accuracy. In the experiments, we have shown that the proposed method can distinguish cloud and non-cloud pixels more accurately compared with existing works.

Download Full-text