scholarly journals Concept Drift Detection in Dynamic Probabilistic Relational Models

Author(s):  
Nils Finke ◽  
Tanya Braun ◽  
Marcel Gehrke ◽  
Ralf Möller

Dynamic probabilistic relational models, which are factorized w.r.t. a full joint distribution, are used to cater for uncertainty and for relational and temporal aspects in real-world data. While these models assume the underlying temporal process to be stationary, real-world data often exhibits non-stationary behavior where the full joint distribution changes over time. We propose an approach to account for non-stationary processes w.r.t. to changing probability distributions over time, an effect known as concept drift. We use factorization and compact encoding of relations to efficiently detect drifts towards new probability distributions based on evidence.

Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 349-371
Author(s):  
Hassan Mehmood ◽  
Panos Kostakos ◽  
Marta Cortes ◽  
Theodoros Anagnostopoulos ◽  
Susanna Pirttikangas ◽  
...  

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.


2021 ◽  
Vol 37 (10) ◽  
pp. S79
Author(s):  
D de Verteuil ◽  
L Azzi ◽  
L Lambert ◽  
B Daneault ◽  
E Dumont ◽  
...  

2017 ◽  
Vol 20 (9) ◽  
pp. A487
Author(s):  
Y Huang ◽  
TE Hartog ◽  
R Vaghjiani ◽  
N Patterson ◽  
H Van Lier ◽  
...  

2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Yange Sun ◽  
Zhihai Wang ◽  
Yang Bai ◽  
Honghua Dai ◽  
Saeid Nahavandi

It is common in real-world data streams that previously seen concepts will reappear, which suggests a unique kind of concept drift, known as recurring concepts. Unfortunately, most of existing algorithms do not take full account of this case. Motivated by this challenge, a novel paradigm was proposed for capturing and exploiting recurring concepts in data streams. It not only incorporates a distribution-based change detector for handling concept drift but also captures recurring concept by storing recurring concepts in a classifier graph. The possibility of detecting recurring drifts allows reusing previously learnt models and enhancing the overall learning performance. Extensive experiments on both synthetic and real-world data streams reveal that the approach performs significantly better than the state-of-the-art algorithms, especially when concepts reappear.


Author(s):  
Felix Hennings ◽  
Lovis Anderson ◽  
Kai Hoppmann-Baum ◽  
Mark Turner ◽  
Thorsten Koch

Abstract Compressor stations are the heart of every high-pressure gas transport network. Located at intersection areas of the network, they are contained in huge complex plants, where they are in combination with valves and regulators responsible for routing and pushing the gas through the network. Due to their complexity and lack of data compressor stations are usually dealt with in the scientific literature in a highly simplified and idealized manner. As part of an ongoing project with one of Germany’s largest transmission system operators to develop a decision support system for their dispatching center, we investigated how to automatize the control of compressor stations. Each station has to be in a particular configuration, leading in combination with the other nearby elements to a discrete set of up to 2000 possible feasible operation modes in the intersection area. Since the desired performance of the station changes over time, the configuration of the station has to adapt. Our goal is to minimize the necessary changes in the overall operation modes and related elements over time while fulfilling a preset performance envelope or demand scenario. This article describes the chosen model and the implemented mixed-integer programming based algorithms to tackle this challenge. By presenting extensive computational results on real-world data, we demonstrate the performance of our approach.


2021 ◽  
Vol 4 (Supplement_1) ◽  
pp. 63-65
Author(s):  
D Y Yang ◽  
T Mullie ◽  
H Sun ◽  
L Russell ◽  
B Roach ◽  
...  

Abstract Background Fecal microbiota transplantation (FMT) is the most effective therapy for recurrent C. difficile infection. Although studies using statistical modeling have shown FMT to be cost-effective, real-world data is lacking. Aims To assess the impact of FMT program on the healthcare cost of recurrent C. difficile infections using real-world data from Alberta’s public healthcare system. Methods C. difficile infection patients were identified through provincial laboratory database with positive C. difficile results in Edmonton, Alberta between 2009–16. If an initial positive test was followed by ≧2 positive tests within 183 days, an individual was categorized as recurrent C. difficile infection (RCDI). Otherwise, non-recurrent C. difficile infection (non-RCDI) was assigned. Since the Edmonton FMT program was established in 2013, patients were further divided into pre-FMT (2009–12) and post-FMT (2013–16) eras. This divided patients into four study groups as outlined in Table 1. Administrative data, including inpatient stays, ambulatory or emergency room visits, outpatient prescriptions, and physician billings, were extracted. A cost of $389 was assigned to each FMT procedure to account for cost of donor screening and sample preparation. A difference in differences (DID) approach, a tool which estimates the effect of a treatment by comparing outcome difference between treatment group and control group over time, was used to analyze the impact of FMT program on the cost of RCDI. Non-RCDI patients were used as control group to account for changes in treatment costs over time. Ordinary least squares regression, with log-transformed healthcare cost as the dependent variable, was used for the analysis. Results 4717 non-RCDI and 548 RCDI patients were identified and divided into the 4 groups (Table 1). RCDI patients were significantly older than non-RCDI patients (71.13 vs 62.49; P < 0.001). After adjusting for differences in age, sex, and baseline healthcare utilization, cost for RCDI patients were significantly lower relative to costs for non-RCDI patients in the post-FMT era. Cost of non-RCDI increased by $5,300.08 between the pre- and post-FMT eras, while the cost of RCDI decreased by $7,654.92 in the same time frame (Table 2). FMT program was estimated to have saved $12,954 annually for RCDI patients at mean age, sex, and baseline cost of our overall sample. Conclusions Our data suggest that the healthcare cost of RCDI has decreased with the introduction of an FMT program. Funding Agencies Alberta Health Services, University of Alberta Hospital Foundation


2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Tinofirei Museba ◽  
Fulufhelo Nelwamondo ◽  
Khmaies Ouahada

Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift. With the prevalence of streaming real-world applications that are associated with changes in the underlying data distribution, the need for applications that are capable of adapting to evolving and time-varying dynamic environments can be hardly overstated. Dynamic environments are nonstationary and change with time and the target variables to be predicted by the learning algorithm and often evolve with time, a phenomenon known as concept drift. Most work in handling concept drift focuses on updating the prediction model so that it can recover from concept drift while little effort has been dedicated to the formulation of a learning system that is capable of learning different types of drifting concepts at any time with minimum overheads. This work proposes a novel and evolving data stream classifier called Adaptive Diversified Ensemble Selection Classifier (ADES) that significantly optimizes adaptation to different types of concept drifts at any time and improves convergence to new concepts by exploiting different amounts of ensemble diversity. The ADES algorithm generates diverse base classifiers, thereby optimizing the margin distribution to exploit ensemble diversity to formulate an ensemble classifier that generalizes well to unseen instances and provides fast recovery from different types of concept drift. Empirical experiments conducted on both artificial and real-world data streams demonstrate that ADES can adapt to different types of drifts at any given time. The prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. The comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. The experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Tinofirei Museba ◽  
Fulufhelo Nelwamondo ◽  
Khmaies Ouahada ◽  
Ayokunle Akinola

For most real-world data streams, the concept about which data is obtained may shift from time to time, a phenomenon known as concept drift. For most real-world applications such as nonstationary time-series data, concept drift often occurs in a cyclic fashion, and previously seen concepts will reappear, which supports a unique kind of concept drift known as recurring concepts. A cyclically drifting concept exhibits a tendency to return to previously visited states. Existing machine learning algorithms handle recurring concepts by retraining a learning model if concept is detected, leading to the loss of information if the concept was well learned by the learning model, and the concept will recur again in the next learning phase. A common remedy for most machine learning algorithms is to retain and reuse previously learned models, but the process is time-consuming and computationally prohibitive in nonstationary environments to appropriately select any optimal ensemble classifier capable of accurately adapting to recurring concepts. To learn streaming data, fast and accurate machine learning algorithms are needed for time-dependent applications. Most of the existing algorithms designed to handle concept drift do not take into account the presence of recurring concept drift. To accurately and efficiently handle recurring concepts with minimum computational overheads, we propose a novel and evolving ensemble method called Recurrent Adaptive Classifier Ensemble (RACE). The algorithm preserves an archive of previously learned models that are diverse and always trains both new and existing classifiers. The empirical experiments conducted on synthetic and real-world data stream benchmarks show that RACE significantly adapts to recurring concepts more accurately than some state-of-the-art ensemble classifiers based on classifier reuse.


2020 ◽  
Author(s):  
Raymond A. Harvey ◽  
Jeremy A. Rassen ◽  
Carly A. Kabelac ◽  
Wendy Turenne ◽  
Sandy Leonard ◽  
...  

AbstractImportanceThere is limited evidence regarding whether the presence of serum antibodies to SARS-CoV-2 is associated with a decreased risk of future infection. Understanding susceptibility to infection and the role of immune memory is important for identifying at-risk populations and could have implications for vaccine deployment.ObjectiveThe purpose of this study was to evaluate subsequent evidence of SARS-CoV-2 infection based on diagnostic nucleic acid amplification test (NAAT) among individuals who are antibody-positive compared with those who are antibody-negative, using real-world data.DesignThis was an observational descriptive cohort study.ParticipantsThe study utilized a national sample to create cohorts from a de-identified dataset composed of commercial laboratory test results, open and closed medical and pharmacy claims, electronic health records, hospital billing (chargemaster) data, and payer enrollment files from the United States. Patients were indexed as antibody-positive or antibody-negative according to their first SARS-CoV-2 antibody test recorded in the database. Patients with more than 1 antibody test on the index date where results were discordant were excluded.Main Outcomes/MeasuresPrimary endpoints were index antibody test results and post-index diagnostic NAAT results, with infection defined as a positive diagnostic test post-index, as measured in 30-day intervals (0-30, 31-60, 61-90, >90 days). Additional measures included demographic, geographic, and clinical characteristics at the time of the index antibody test, such as recorded signs and symptoms or prior evidence of COVID-19 (diagnoses or NAAT+) and recorded comorbidities.ResultsWe included 3,257,478 unique patients with an index antibody test. Of these, 2,876,773 (88.3%) had a negative index antibody result, 378,606 (11.6%) had a positive index antibody result, and 2,099 (0.1%) had an inconclusive index antibody result. Patients with a negative antibody test were somewhat older at index than those with a positive result (mean of 48 versus 44 years). A fraction (18.4%) of individuals who were initially seropositive converted to seronegative over the follow up period. During the follow-up periods, the ratio (CI) of positive NAAT results among individuals who had a positive antibody test at index versus those with a negative antibody test at index was 2.85 (2.73 - 2.97) at 0-30 days, 0.67 (0.6 - 0.74) at 31-60 days, 0.29 (0.24 - 0.35) at 61-90 days), and 0.10 (0.05 - 0.19) at >90 days.ConclusionsPatients who display positive antibody tests are initially more likely to have a positive NAAT, consistent with prolonged RNA shedding, but over time become markedly less likely to have a positive NAAT. This result suggests seropositivity using commercially available assays is associated with protection from infection. The duration of protection is unknown and may wane over time; this parameter will need to be addressed in a study with extended duration of follow up.Key PointsQuestionCan real-world data be used to evaluate the comparative risk of SARS-CoV-2 infection for individuals who are antibody-positive versus antibody-negative?FindingOf patients indexed on a positive antibody test, 10 of 3,226 with a NAAT (0.3%) had evidence of a positive NAAT > 90 days after index, compared with 491 of 16,157 (3.0%) indexed on a negative antibody test.MeaningIndividuals who are seropositive for SARS-CoV-2 based on commercial assays may be at decreased future risk of SARS-CoV-2 infection.


Sign in / Sign up

Export Citation Format

Share Document