Discussion and review on evolving data streams and concept drift adapting

2016 ◽  
Vol 9 (1) ◽  
pp. 1-23 ◽  
Author(s):  
Imen Khamassi ◽  
Moamar Sayed-Mouchaweh ◽  
Moez Hammami ◽  
Khaled Ghédira
2020 ◽  
Author(s):  
Álvaro C. Lemos Neto ◽  
Rodrigo A. Coelho ◽  
Cristiano L. de Castro

Due to Big Data and the Internet of Things, Machine Learning algorithms targeted specifically to model evolving data streams had gained attention from both academia and industry. Many Incremental Learning models had been successful in doing so, but most of them have one thing in common: they are complex variants of batch learning algorithms, which is a problem since, in a streaming setting, less complexity and more performance is desired. This paper proposes the Incremental LSTM model, which is a variant of the original LSTM with minor changes, that can tackle evolving data streams problems such as concept drift and the elasticity-plasticity dilemma without neither needing a dedicated drift detector nor a memory management system. It obtained great results that show it reacts fast to concept drifts and that is also robust to noise data.


ICT Express ◽  
2020 ◽  
Vol 6 (4) ◽  
pp. 332-338
Author(s):  
Myuu Myuu Wai Yan

Smart Cities ◽  
2021 ◽  
Vol 4 (1) ◽  
pp. 349-371
Author(s):  
Hassan Mehmood ◽  
Panos Kostakos ◽  
Marta Cortes ◽  
Theodoros Anagnostopoulos ◽  
Susanna Pirttikangas ◽  
...  

Real-world data streams pose a unique challenge to the implementation of machine learning (ML) models and data analysis. A notable problem that has been introduced by the growth of Internet of Things (IoT) deployments across the smart city ecosystem is that the statistical properties of data streams can change over time, resulting in poor prediction performance and ineffective decisions. While concept drift detection methods aim to patch this problem, emerging communication and sensing technologies are generating a massive amount of data, requiring distributed environments to perform computation tasks across smart city administrative domains. In this article, we implement and test a number of state-of-the-art active concept drift detection algorithms for time series analysis within a distributed environment. We use real-world data streams and provide critical analysis of results retrieved. The challenges of implementing concept drift adaptation algorithms, along with their applications in smart cities, are also discussed.


2021 ◽  
Vol 105 ◽  
pp. 107255
Author(s):  
Si-si Zhang ◽  
Jian-wei Liu ◽  
Xin Zuo

Entropy ◽  
2021 ◽  
Vol 23 (7) ◽  
pp. 859
Author(s):  
Abdulaziz O. AlQabbany ◽  
Aqil M. Azmi

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson(1) distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (ρ), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter λ of the Poisson distribution that yields the best value for ρ. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations.


2021 ◽  
pp. 1-12
Author(s):  
Salah Ud Din ◽  
Jay Kumar ◽  
Junming Shao ◽  
Cobbinah Bernard Mawuli ◽  
Waldiodio David Ndiaye

Author(s):  
Heitor Murilo Gomes ◽  
Jesse Read ◽  
Albert Bifet ◽  
Robert J. Durrant
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document