Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams

Author(s):  
Senlin Luo ◽  
Weixiao Zhao ◽  
Limin Pan
2016 ◽  
Vol 22 (2) ◽  
pp. 511-522 ◽  
Author(s):  
Dhouha Mejri ◽  
Mohamed Limam ◽  
Claus Weihs

Author(s):  
Yang Lu ◽  
Yiu-ming Cheung ◽  
Yuan Yan Tang

Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-based incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are four-fold: (1) it can keep stable for non-drifted streams and quickly adapt to the new concept; (2) it is totally incremental, i.e. no previous data needs to be stored; (3) it keeps a limited number of classifiers to ensure high efficiency; and (4) it is simple and needs only one thresholding parameter. Experiments on both synthetic and real data sets with concept drift show that DWMIL performs better than the state-of-the-art competitors, with less computational cost.


Author(s):  
LAKSHMI PRANEETHA

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.


Sign in / Sign up

Export Citation Format

Share Document