Fast online computation of the Qn estimator with applications to the detection of outliers in data streams

AbstractWe present afqn (Approximate Fast Qn), a novel algorithm for approximate computation of the Qn scale estimator in a streaming setting, in the sliding window model. It is well-known that computing the Qn estimator exactly may be too costly for some applications, and the problem is a fortiori exacerbated in the streaming setting, in which the time available to process incoming data stream items is short. In this paper we show how to efficiently and accurately approximate the Qn estimator. As an application, we show the use of afqn for fast detection of outliers in data streams. In particular, the outliers are detected in the sliding window model, with a simple check based on the Qn scale estimator. Extensive experimental results on synthetic and real datasets confirm the validity of our approach by showing up to three times faster updates per second. Our contributions are the following ones: (i) to the best of our knowledge, we present the first approximation algorithm for online computation of the Qn scale estimator in a streaming setting and in the sliding window model; (ii) we show how to take advantage of our UDDSketch algorithm for quantile estimation in order to quickly compute the Qn scale estimator; (iii) as an example of a possible application of the Qn scale estimator, we discuss how to detect outliers in an input data stream.

Download Full-text

Online Feature Extraction Algorithms for Data Streams

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.132.6 ◽

2012 ◽

Vol 132 (1) ◽

pp. 6-13

Author(s):

Seiichi Ozawa

Keyword(s):

Feature Extraction ◽

Data Streams

Download Full-text

Filtering of Mixed Data Streams with Orthogonal Polarization up to 50 Gbps in Micro-Ring/Bus Waveguide

2019 24th OptoElectronics and Communications Conference (OECC) and 2019 International Conference on Photonics in Switching and Computing (PSC) ◽

10.23919/ps.2019.8817775 ◽

2019 ◽

Author(s):

Zih-Chun Su ◽

Chih-Hsien Cheng ◽

Bo-Ji Huang ◽

Huai-Yung Wang ◽

Chun-Nien Liu ◽

...

Keyword(s):

Data Streams ◽

Mixed Data ◽

Orthogonal Polarization

Download Full-text

Improved Macro-clusters generation using Top-k shared Micro-clusters in Data Streams

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i10.400 ◽

2017 ◽

Vol 7 (10) ◽

pp. 52

Author(s):

LAKSHMI PRANEETHA

Keyword(s):

Real Time ◽

Data Streams ◽

Bloom Filter ◽

Scientific Applications ◽

Pruning Algorithm ◽

Density Data ◽

Data Points ◽

Short Time ◽

Information Streams

Now-a-days data streams or information streams are gigantic and quick changing. The usage of information streams can fluctuate from basic logical, scientific applications to vital business and money related ones. The useful information is abstracted from the stream and represented in the form of micro-clusters in the online phase. In offline phase micro-clusters are merged to form the macro clusters. DBSTREAM technique captures the density between micro-clusters by means of a shared density graph in the online phase. The density data in this graph is then used in reclustering for improving the formation of clusters but DBSTREAM takes more time in handling the corrupted data points In this paper an early pruning algorithm is used before pre-processing of information and a bloom filter is used for recognizing the corrupted information. Our experiments on real time datasets shows that using this approach improves the efficiency of macro-clusters by 90% and increases the generation of more number of micro-clusters within in a short time.

Download Full-text