Online GBDT with Chunk Dynamic Weighted Majority Learners for Noisy and Drifting Data Streams

Concept drifts occurring in data streams will jeopardize the accuracy and stability of the online learning process. If the data stream is imbalanced, it will be even more challenging to detect and cure the concept drift. In the literature, these two problems have been intensively addressed separately, but have yet to be well studied when they occur together. In this paper, we propose a chunk-based incremental learning method called Dynamic Weighted Majority for Imbalance Learning (DWMIL) to deal with the data streams with concept drift and class imbalance problem. DWMIL utilizes an ensemble framework by dynamically weighting the base classifiers according to their performance on the current data chunk. Compared with the existing methods, its merits are four-fold: (1) it can keep stable for non-drifted streams and quickly adapt to the new concept; (2) it is totally incremental, i.e. no previous data needs to be stored; (3) it keeps a limited number of classifiers to ensure high efficiency; and (4) it is simple and needs only one thresholding parameter. Experiments on both synthetic and real data sets with concept drift show that DWMIL performs better than the state-of-the-art competitors, with less computational cost.

Download Full-text

Online Feature Extraction Algorithms for Data Streams

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.6 ◽

2012 ◽

Vol 132 (1) ◽

pp. 6-13

Author(s):

Seiichi Ozawa

Keyword(s):

Feature Extraction ◽

Data Streams

Download Full-text

Filtering of Mixed Data Streams with Orthogonal Polarization up to 50 Gbps in Micro-Ring/Bus Waveguide

2019 24th OptoElectronics and Communications Conference (OECC) and 2019 International Conference on Photonics in Switching and Computing (PSC) ◽

10.23919/ps.2019.8817775 ◽

2019 ◽

Author(s):

Zih-Chun Su ◽

Chih-Hsien Cheng ◽

Bo-Ji Huang ◽

Huai-Yung Wang ◽

Chun-Nien Liu ◽

...

Keyword(s):

Data Streams ◽

Mixed Data ◽

Orthogonal Polarization

Download Full-text

Improved Macro-clusters generation using Top-k shared Micro-clusters in Data Streams

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.400 ◽

2017 ◽

Vol 7 (10) ◽

pp. 52

Author(s):

LAKSHMI PRANEETHA

Keyword(s):

Real Time ◽

Data Streams ◽

Bloom Filter ◽

Scientific Applications ◽

Pruning Algorithm ◽

Density Data ◽

Data Points ◽

Short Time ◽

Information Streams

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.

Download Full-text