ADES: A New Ensemble Diversity-Based Approach for Handling Concept Drift

Beyond applying machine learning predictive models to static tasks, a significant corpus of research exists that applies machine learning predictive models to streaming environments that incur concept drift. With the prevalence of streaming real-world applications that are associated with changes in the underlying data distribution, the need for applications that are capable of adapting to evolving and time-varying dynamic environments can be hardly overstated. Dynamic environments are nonstationary and change with time and the target variables to be predicted by the learning algorithm and often evolve with time, a phenomenon known as concept drift. Most work in handling concept drift focuses on updating the prediction model so that it can recover from concept drift while little effort has been dedicated to the formulation of a learning system that is capable of learning different types of drifting concepts at any time with minimum overheads. This work proposes a novel and evolving data stream classifier called Adaptive Diversified Ensemble Selection Classifier (ADES) that significantly optimizes adaptation to different types of concept drifts at any time and improves convergence to new concepts by exploiting different amounts of ensemble diversity. The ADES algorithm generates diverse base classifiers, thereby optimizing the margin distribution to exploit ensemble diversity to formulate an ensemble classifier that generalizes well to unseen instances and provides fast recovery from different types of concept drift. Empirical experiments conducted on both artificial and real-world data streams demonstrate that ADES can adapt to different types of drifts at any given time. The prediction performance of ADES is compared to three other ensemble classifiers designed to handle concept drift using both artificial and real-world data streams. The comparative evaluation performed demonstrated the ability of ADES to handle different types of concept drifts. The experimental results, including statistical test results, indicate comparable performances with other algorithms designed to handle concept drift and prove their significance and effectiveness.

Download Full-text

Concept Drift Adaptation Techniques in Distributed Environment for Real-World Data Streams

Smart Cities ◽

10.3390/smartcities4010021 ◽

2021 ◽

Vol 4 (1) ◽

pp. 349-371

Author(s):

Hassan Mehmood ◽

Panos Kostakos ◽

Marta Cortes ◽

Theodoros Anagnostopoulos ◽

Susanna Pirttikangas ◽

...

Keyword(s):

Real World ◽

Data Streams ◽

Smart City ◽

Smart Cities ◽

Concept Drift ◽

Distributed Environment ◽

Real World Data ◽

Unique Challenge ◽

World Data ◽

Concept Drift Detection

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.

Download Full-text

Developing a machine learning environmental allergy prediction model from real world data through a novel decentralized mobile study platform.

10.1101/2020.09.21.20199224 ◽

2020 ◽

Author(s):

Chethan Sarabu ◽

Sandra Steyaert ◽

Nirav Shah

Keyword(s):

Machine Learning ◽

Real World ◽

Predictive Models ◽

Clinical Care ◽

Sensor Data ◽

Real World Data ◽

World Data ◽

Demographic Groups ◽

Prediction And Prevention ◽

Wide Range

Environmental allergies cause significant morbidity across a wide range of demographic groups. This morbidity could be mitigated through individualized predictive models capable of guiding personalized preventive measures. We developed a predictive model by integrating smartphone sensor data with symptom diaries maintained by patients. The machine learning model was found to be highly predictive, with an accuracy of 0.801. Such models based on real-world data can guide clinical care for patients and providers, reduce the economic burden of uncontrolled allergies, and set the stage for subsequent research pursuing allergy prediction and prevention. Moreover, this study offers proof-of-principle regarding the feasibility of building clinically useful predictive models from 'messy,' participant derived real-world data.

Download Full-text

A Classifier Graph Based Recurring Concept Detection and Prediction Approach

Computational Intelligence and Neuroscience ◽

10.1155/2018/4276291 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 4

Author(s):

Yange Sun ◽

Zhihai Wang ◽

Yang Bai ◽

Honghua Dai ◽

Saeid Nahavandi

Keyword(s):

Real World ◽

Data Streams ◽

Concept Drift ◽

Learning Performance ◽

Concept Detection ◽

Real World Data ◽

Full Account ◽

World Data ◽

Prediction Approach ◽

Better Than

It is common in real-world data streams that previously seen concepts will reappear, which suggests a unique kind of concept drift, known as recurring concepts. Unfortunately, most of existing algorithms do not take full account of this case. Motivated by this challenge, a novel paradigm was proposed for capturing and exploiting recurring concepts in data streams. It not only incorporates a distribution-based change detector for handling concept drift but also captures recurring concept by storing recurring concepts in a classifier graph. The possibility of detecting recurring drifts allows reusing previously learnt models and enhancing the overall learning performance. Extensive experiments on both synthetic and real-world data streams reveal that the approach performs significantly better than the state-of-the-art algorithms, especially when concepts reappear.

Download Full-text

Validation of machine learning models to predict dementia-related neuropsychiatric symptoms in real-world data

10.21203/rs.3.rs-17985/v1 ◽

2020 ◽

Author(s):

Javier Mar ◽

Ania Gorostiza ◽

Oliver Ibarrondo ◽

Carlos Cernuda ◽

Arantzazu Arrospide ◽

...

Keyword(s):

Machine Learning ◽

Depressive Symptoms ◽

Real World ◽

Predictive Models ◽

Neuropsychiatric Symptoms ◽

Health Record ◽

Kappa Index ◽

Real World Data ◽

World Data ◽

Electronic Health

Abstract Background Neuropsychiatric symptoms (NPS) are the leading cause of the social burden of dementia but their role is underestimated. The objective of the study was to validate predictive models to separately identify psychotic and depressive symptoms in patients diagnosed with dementia using clinical databases representing the whole population (real-world data). Methods First, we searched the electronic health records of 4,003 patients with dementia to identify NPS. Second, machine learning (random forest) algorithms were applied to build in the training sample (N=3,003) separate predictive models for psychotic and depressive symptoms. In order to evaluate the classification ability of the models, the following statistics were calculated for each model: the area under the receiver operating curve (AUC), sensitivity, specificity, accuracy, no-information rate and Kappa index. Third, calibration and discrimination were assessed in the validation sample (N= 1,000) to assess the performance of the models. A calibration curve was drawn by plotting the predicted probabilities for groups on the x-axis and the mean observed values on the y-axis. Results Neuropsychiatric symptoms were noted in the electronic health record of 58% of patients. The AUC reached 0.80 for the psychotic symptoms model and 0.74 for the depressive symptoms model. The Kappa index and accuracy also showed better discrimination in the psychotic model. Calibration plots indicated that both types of model had less predictive accuracy when the probability of neuropsychiatric symptoms was < 25%. The most important variables in the psychotic symptom model were use of risperidone, level of sedation, quetiapine and haloperidol and the number of antipsychotics prescribed. In the depressive symptom model, the most important variable was number of antidepressants prescribed, use of escitalopram, level of sedation and age. Conclusions More than half of the sample had NPS as identified by the presence of key terms in the electronic health record. Although NPS are not encoded, they are treated with antipsychotics and antidepressants, which allows developing valid predictive models by joining machine learning tools and real-world data. Given their good performance, the predictive models can be used to estimate prevalence of NPS in population databases.

Download Full-text

Validation and calibration of machine‐learning predictive models aimed to identify dementia‐related neuropsychiatric symptoms on real‐world data (RWD)

Alzheimer s & Dementia ◽

10.1002/alz.039104 ◽

2020 ◽

Vol 16 (S6) ◽

Author(s):

Javier Mar ◽

Ania Gorostiza ◽

Oliver Ibarrondo ◽

Carlos Cernuda ◽

Ane Alberdi ◽

...

Keyword(s):

Machine Learning ◽

Real World ◽

Predictive Models ◽

Neuropsychiatric Symptoms ◽

Real World Data ◽

World Data

Download Full-text

Recurrent Adaptive Classifier Ensemble for Handling Recurring Concept Drifts

Applied Computational Intelligence and Soft Computing ◽

10.1155/2021/5533777 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Tinofirei Museba ◽

Fulufhelo Nelwamondo ◽

Khmaies Ouahada ◽

Ayokunle Akinola

Keyword(s):

Machine Learning ◽

Real World ◽

Concept Drift ◽

Learning Algorithms ◽

Learning Model ◽

Machine Learning Algorithms ◽

Classifier Ensemble ◽

Series Data ◽

Real World Data ◽

World Data

For most real-world data streams, the concept about which data is obtained may shift from time to time, a phenomenon known as concept drift. For most real-world applications such as nonstationary time-series data, concept drift often occurs in a cyclic fashion, and previously seen concepts will reappear, which supports a unique kind of concept drift known as recurring concepts. A cyclically drifting concept exhibits a tendency to return to previously visited states. Existing machine learning algorithms handle recurring concepts by retraining a learning model if concept is detected, leading to the loss of information if the concept was well learned by the learning model, and the concept will recur again in the next learning phase. A common remedy for most machine learning algorithms is to retain and reuse previously learned models, but the process is time-consuming and computationally prohibitive in nonstationary environments to appropriately select any optimal ensemble classifier capable of accurately adapting to recurring concepts. To learn streaming data, fast and accurate machine learning algorithms are needed for time-dependent applications. Most of the existing algorithms designed to handle concept drift do not take into account the presence of recurring concept drift. To accurately and efficiently handle recurring concepts with minimum computational overheads, we propose a novel and evolving ensemble method called Recurrent Adaptive Classifier Ensemble (RACE). The algorithm preserves an archive of previously learned models that are diverse and always trains both new and existing classifiers. The empirical experiments conducted on synthetic and real-world data stream benchmarks show that RACE significantly adapts to recurring concepts more accurately than some state-of-the-art ensemble classifiers based on classifier reuse.

Download Full-text