Pattern Mining for Outbreak Discovery Preparedness

Data Mining ◽  
2013 ◽  
pp. 2057-2068
Author(s):  
Zalizah Awang Long ◽  
Abdul Razak Hamdan ◽  
Azuraliza Abu Bakar ◽  
Mazrura Sahani

Today, the objective of public health surveillance system is to reduce the impact of outbreaks by enabling appropriate intervention. Commonly used techniques are based on the changes or aberration in health events when compared with normal history to detect an outbreak. The main problem encountered in outbreaks is high rates of false alarm. High false alarm rates can lead to unnecessary interventions, and falsely detected outbreaks will lead to costly investigation. In this chapter, the authors review data mining techniques focusing on frequent and outlier mining to develop generic outbreak detection process model, named as “Frequent-outlier” model. The process model was tested against the real dengue dataset obtained from FSK, UKM, and also tested on the synthetic respiratory dataset obtained from AUTON LAB. The ROC was run to analyze the overall performance of “frequent-outlier” with CUSUM and Moving Average (MA). The results were promising and were evaluated using detection rate, false positive rate, and overall performance. An important outcome of this study is the knowledge rules derived from the notification of the outbreak cases to be used in counter measure assessment for outbreak preparedness.

Author(s):  
Zalizah Awang Long ◽  
Abdul Razak Hamdan ◽  
Azuraliza Abu Bakar ◽  
Mazrura Sahani

Today, the objective of public health surveillance system is to reduce the impact of outbreaks by enabling appropriate intervention. Commonly used techniques are based on the changes or aberration in health events when compared with normal history to detect an outbreak. The main problem encountered in outbreaks is high rates of false alarm. High false alarm rates can lead to unnecessary interventions, and falsely detected outbreaks will lead to costly investigation. In this chapter, the authors review data mining techniques focusing on frequent and outlier mining to develop generic outbreak detection process model, named as “Frequent-outlier” model. The process model was tested against the real dengue dataset obtained from FSK, UKM, and also tested on the synthetic respiratory dataset obtained from AUTON LAB. The ROC was run to analyze the overall performance of “frequent-outlier” with CUSUM and Moving Average (MA). The results were promising and were evaluated using detection rate, false positive rate, and overall performance. An important outcome of this study is the knowledge rules derived from the notification of the outbreak cases to be used in counter measure assessment for outbreak preparedness.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
K Gudmundsson ◽  
P Lynga ◽  
A Langius-Eklof ◽  
E Hagglund ◽  
A Hagg-Martinell ◽  
...  

Abstract Background Daily body weight (BW) is a mainstay in the management of patients with chronic heart failure (HF). Guidelines recommend to take action if BW increases more than 2kg within 3 days. However, the evidence behind the 2kg/3d rule is unclear and studies have shown poor diagnostic performance of this algorithm. Purpose To assess the diagnostic value of different BW thresholds and time intervals to alert for imminent HF decompensation. Methods We studied 184 patients with HF (age 71±10 yr, EF 26±11%). 43% had been hospitalized for HF during the preceding year. They were assessed by daily BW using digital scales with direct data transfer to a central data base. The mean follow-up was 286 days. To decrease day-to-day variability, BW was analysed based on a daily moving average over 3 days. We retrospectively calculated the sensitivity and false-positive rate of BW thresholds at 1.5, 2.0, 2.5, 3.0 and 3.5 kg and time intervals between 2 and 30 days. Threshold crossings occurring within 30 days prior to a hospitalization for decompensated HF were deemed a positive alert. Results The sensitivity of 2kg/3d was poor (13%). Prolonging the time interval of weight changes markedly improved sensitivity. Increasing the weight threshold decreased the false positive rate. Greatest sensitivity (60%) was achieved using a 14 day interval at a weight threshold of 1.5 kg. However, this was associated with a high rate of false alerts (3.1 per patient/year). A weight threshold of 3.5 kg resulted in excellent specificity (0.3 false alerts per patient/year), however sensitivity was low (20%, 20 day time interval). Conclusion Monitoring daily BW using a 2kg/3d algorithm is associated with poor diagnostic performance. Generally, by analyzing stable trends over time (moving average) and using prolonged time intervals, BW monitoring with digital scales can achieve a clinically meaningful diagnostic performance. This new approach to BW monitoring may improve early detection of imminent HF decompensation.


2021 ◽  
Vol 14 (1) ◽  
pp. 244-256
Author(s):  
Gokulapriya Raman ◽  
◽  
Ganesh Raj ◽  

Web usage behaviour mining is a substantial research problem to be resolved as it identifies different user’s behaviour pattern by analysing web log files. But, accuracy of finding the usage behaviour of users frequently accessed web patterns was limited and also it requires more time. Mutual Information Pre-processing based Broken-Stick Linear Regression (MIP-BSLR) technique is proposed for refining the performance of web user behaviour pattern mining with higher accuracy. Initially, web log files from Apache web log dataset and NASA dataset are considered as input. Then, Mutual Information based Pre-processing (MI-P) method is applied to compute mutual dependence between the two web patterns. Based on the computed value, web access patterns which relevant are taken for further processing and irrelevant patterns are removed. After that, Broken-Stick Linear Regression analysis (BLRA) is performed in MIPBSLR for Web User Behaviour analysis. By applying the BLRA, the frequently visited web patterns are identified. With the identification of frequently visited web patterns, MIP-BSLR technique exactly predicts the usage behaviour of web users, and also increases the performance of web usage behaviour mining. Experimental evaluation of MIPBSLR method is conducted on factors such as pattern mining accuracy, false positives, time requirements and space requirements with respect to number of web patterns. Outcomes show that the proposed technique improves the pattern mining accuracy by 14%, and reduces the false positive rate by 52%, time requirement by 19% and space complexity by 21% using Apache web log dataset as compared to conventional methods. Similarly, the pattern mining accuracy of NASA dataset is increased by 16% with the reduction of false positive rate by 47%, time requirement by 20% and space complexity by 22% as compared to conventional methods.


2020 ◽  
Author(s):  
Tom Duchemin ◽  
Angela Noufaily ◽  
Mounia N. Hocine

Surveillance for infectious disease outbreak or for other processes should sometimes be implemented simultaneously on multiple sites to detect local events. Sick leave can be monitored accross companies to detect issues such as local outbreaks and identify companies-related issues as local spreading of infectious diseases or bad management practice. In this context, we proposed an adaptation of the Quasi-Poisson regression-based Farrington algorithm for multi-site surveillance. The proposed algorithm consists of a Negative-Binomial mixed effect regression with a new re-weighting procedure to account for past outbreaks and increase sensitivity of the model. We perform a wide range simulations to assess the performance of the model in terms of False Positive Rate and Probability of Detection. We propose an application to sick leave rate in the context of COVID-19. The proposed algorithm provides good overall performance and opens up new opportunities for multi-site data surveillance.


Author(s):  
L. Chen ◽  
F. Rottensteiner ◽  
C. Heipke

In this paper we describe learning of a descriptor based on the Siamese Convolutional Neural Network (CNN) architecture and evaluate our results on a standard patch comparison dataset. The descriptor learning architecture is composed of an input module, a Siamese CNN descriptor module and a cost computation module that is based on the L2 Norm. The cost function we use pulls the descriptors of matching patches close to each other in feature space while pushing the descriptors for non-matching pairs away from each other. Compared to related work, we optimize the training parameters by combining a moving average strategy for gradients and Nesterov's Accelerated Gradient. Experiments show that our learned descriptor reaches a good performance and achieves state-of-art results in terms of the false positive rate at a 95 % recall rate on standard benchmark datasets.


Author(s):  
L. Chen ◽  
F. Rottensteiner ◽  
C. Heipke

In this paper we describe learning of a descriptor based on the Siamese Convolutional Neural Network (CNN) architecture and evaluate our results on a standard patch comparison dataset. The descriptor learning architecture is composed of an input module, a Siamese CNN descriptor module and a cost computation module that is based on the L2 Norm. The cost function we use pulls the descriptors of matching patches close to each other in feature space while pushing the descriptors for non-matching pairs away from each other. Compared to related work, we optimize the training parameters by combining a moving average strategy for gradients and Nesterov's Accelerated Gradient. Experiments show that our learned descriptor reaches a good performance and achieves state-of-art results in terms of the false positive rate at a 95 % recall rate on standard benchmark datasets.


2002 ◽  
Vol 41 (01) ◽  
pp. 37-41 ◽  
Author(s):  
S. Shung-Shung ◽  
S. Yu-Chien ◽  
Y. Mei-Due ◽  
W. Hwei-Chung ◽  
A. Kao

Summary Aim: Even with careful observation, the overall false-positive rate of laparotomy remains 10-15% when acute appendicitis was suspected. Therefore, the clinical efficacy of Tc-99m HMPAO labeled leukocyte (TC-WBC) scan for the diagnosis of acute appendicitis in patients presenting with atypical clinical findings is assessed. Patients and Methods: Eighty patients presenting with acute abdominal pain and possible acute appendicitis but atypical findings were included in this study. After intravenous injection of TC-WBC, serial anterior abdominal/pelvic images at 30, 60, 120 and 240 min with 800k counts were obtained with a gamma camera. Any abnormal localization of radioactivity in the right lower quadrant of the abdomen, equal to or greater than bone marrow activity, was considered as a positive scan. Results: 36 out of 49 patients showing positive TC-WBC scans received appendectomy. They all proved to have positive pathological findings. Five positive TC-WBC were not related to acute appendicitis, because of other pathological lesions. Eight patients were not operated and clinical follow-up after one month revealed no acute abdominal condition. Three of 31 patients with negative TC-WBC scans received appendectomy. They also presented positive pathological findings. The remaining 28 patients did not receive operations and revealed no evidence of appendicitis after at least one month of follow-up. The overall sensitivity, specificity, accuracy, positive and negative predictive values for TC-WBC scan to diagnose acute appendicitis were 92, 78, 86, 82, and 90%, respectively. Conclusion: TC-WBC scan provides a rapid and highly accurate method for the diagnosis of acute appendicitis in patients with equivocal clinical examination. It proved useful in reducing the false-positive rate of laparotomy and shortens the time necessary for clinical observation.


1993 ◽  
Vol 32 (02) ◽  
pp. 175-179 ◽  
Author(s):  
B. Brambati ◽  
T. Chard ◽  
J. G. Grudzinskas ◽  
M. C. M. Macintosh

Abstract:The analysis of the clinical efficiency of a biochemical parameter in the prediction of chromosome anomalies is described, using a database of 475 cases including 30 abnormalities. A comparison was made of two different approaches to the statistical analysis: the use of Gaussian frequency distributions and likelihood ratios, and logistic regression. Both methods computed that for a 5% false-positive rate approximately 60% of anomalies are detected on the basis of maternal age and serum PAPP-A. The logistic regression analysis is appropriate where the outcome variable (chromosome anomaly) is binary and the detection rates refer to the original data only. The likelihood ratio method is used to predict the outcome in the general population. The latter method depends on the data or some transformation of the data fitting a known frequency distribution (Gaussian in this case). The precision of the predicted detection rates is limited by the small sample of abnormals (30 cases). Varying the means and standard deviations (to the limits of their 95% confidence intervals) of the fitted log Gaussian distributions resulted in a detection rate varying between 42% and 79% for a 5% false-positive rate. Thus, although the likelihood ratio method is potentially the better method in determining the usefulness of a test in the general population, larger numbers of abnormal cases are required to stabilise the means and standard deviations of the fitted log Gaussian distributions.


2019 ◽  
Author(s):  
Amanda Kvarven ◽  
Eirik Strømland ◽  
Magnus Johannesson

Andrews & Kasy (2019) propose an approach for adjusting effect sizes in meta-analysis for publication bias. We use the Andrews-Kasy estimator to adjust the result of 15 meta-analyses and compare the adjusted results to 15 large-scale multiple labs replication studies estimating the same effects. The pre-registered replications provide precisely estimated effect sizes, which do not suffer from publication bias. The Andrews-Kasy approach leads to a moderate reduction of the inflated effect sizes in the meta-analyses. However, the approach still overestimates effect sizes by a factor of about two or more and has an estimated false positive rate of between 57% and 100%.


2019 ◽  
Author(s):  
Stephen D Benning ◽  
Edward Smith

The emergent interpersonal syndrome (EIS) approach conceptualizes personality disorders as the interaction among their constituent traits to predict important criterion variables. We detail the difficulties we have experienced finding such interactive predictors in our empirical work on psychopathy, even when using uncorrelated traits that maximize power. Rather than explaining a large absolute proportion of variance in interpersonal outcomes, EIS interactions might explain small amounts of variance relative to the main effects of each trait. Indeed, these interactions may necessitate samples of almost 1,000 observations for 80% power and a false positive rate of .05. EIS models must describe which specific traits’ interactions constitute a particular EIS, as effect sizes appear to diminish as higher-order trait interactions are analyzed. Considering whether EIS interactions are ordinal with non-crossing slopes, disordinal with crossing slopes, or entail non-linear threshold or saturation effects may help researchers design studies, sampling strategies, and analyses to model their expected effects efficiently.


Sign in / Sign up

Export Citation Format

Share Document