AFQN: approximate Qn estimation in data streams

Applied Intelligence ◽

10.1007/s10489-021-02614-w ◽

2021 ◽

Author(s):

Italo Epicoco ◽

Catiuscia Melle ◽

Massimo Cafaro ◽

Marco Pulimeno

Keyword(s):

Data Streams ◽

Data Stream ◽

Input Data ◽

Sliding Window ◽

Quantile Estimation ◽

Fast Detection ◽

Online Computation ◽

First Approximation ◽

Detection Of Outliers ◽

Novel Algorithm

AbstractWe present afqn (Approximate Fast Qn), a novel algorithm for approximate computation of the Qn scale estimator in a streaming setting, in the sliding window model. It is well-known that computing the Qn estimator exactly may be too costly for some applications, and the problem is a fortiori exacerbated in the streaming setting, in which the time available to process incoming data stream items is short. In this paper we show how to efficiently and accurately approximate the Qn estimator. As an application, we show the use of afqn for fast detection of outliers in data streams. In particular, the outliers are detected in the sliding window model, with a simple check based on the Qn scale estimator. Extensive experimental results on synthetic and real datasets confirm the validity of our approach by showing up to three times faster updates per second. Our contributions are the following ones: (i) to the best of our knowledge, we present the first approximation algorithm for online computation of the Qn scale estimator in a streaming setting and in the sliding window model; (ii) we show how to take advantage of our UDDSketch algorithm for quantile estimation in order to quickly compute the Qn scale estimator; (iii) as an example of a possible application of the Qn scale estimator, we discuss how to detect outliers in an input data stream.

Download Full-text

Fast online computation of the Qn estimator with applications to the detection of outliers in data streams

Expert Systems with Applications ◽

10.1016/j.eswa.2020.113831 ◽

2021 ◽

Vol 164 ◽

pp. 113831

Author(s):

Massimo Cafaro ◽

Catiuscia Melle ◽

Marco Pulimeno ◽

Italo Epicoco

Keyword(s):

Data Streams ◽

Online Computation ◽

Detection Of Outliers

Download Full-text

A Method for Processing Top-k Continuous Query on Uncertain Data Stream in Sliding Window Model

WSEAS TRANSACTIONS ON SYSTEMS AND CONTROL ◽

10.37394/23203.2021.16.22 ◽

2021 ◽

Vol 16 ◽

pp. 261-269

Author(s):

Raja Azhan Syah Raja Wahab ◽

Siti Nurulain Mohd Rum ◽

Hamidah Ibrahim ◽

Fatimah Sidi ◽

Iskandar Ishak

Keyword(s):

Query Processing ◽

Data Streams ◽

Data Stream ◽

Uncertain Data ◽

Research Work ◽

Computational Cost ◽

Sliding Window ◽

Possible World ◽

Processing Methods ◽

Uncertain Data Streams

The data stream is a series of data generated at sequential time from different sources. Processing such data is very important in many contemporary applications such as sensor networks, RFID technology, mobile computing and many more. The huge amount data generated and frequent changes in a short time makes the conventional processing methods insufficient. The Sliding Window Model (SWM) was introduced by Datar et. al to handle this problem. Avoiding multiple scans of the whole data sets, optimizing memory usage, and processing only the most recent tuple are the main challenges. The number of possible world instances grows exponentially in uncertain data and it is highly difficult to comprehend what it takes to meet Top-k query processing in the shortest amount of time. Following the generation of rules and the probability theory of this model, a framework was anticipated to sustain top-k processing algorithm over the SWM approach until the candidates expired. Based on the literature review study, none of the existing work have been made to tackle the issue arises from the top-k query processing of the possible world instance of the uncertain data streams within the SWM. The major issue resulted from these scenarios need to be addressed especially in the computation redundancy area that contributed to the increases of computational cost within the SWM. Therefore, the main objective of this research work is to propose the top-k query processing methods over uncertain data streams in SWM utilizing the score and the Possible World (PW) setting. In this study, a novel expiration and object indexing method is introduced to address the computational redundancy issues. We believed the proposed method can reduce computational costs and by managing insertion and exit policy on the right tuple candidates within a specified window frame. This research work will contribute to the area of computational query processing.

Download Full-text

Targeted Adaptable Sample for Accurate and Efficient Quantile Estimation in Non-Stationary Data Streams

Machine Learning and Knowledge Extraction ◽

10.3390/make1030049 ◽

2019 ◽

Vol 1 (3) ◽

pp. 848-870

Author(s):

Ognjen Arandjelović

Keyword(s):

Data Streams ◽

Data Stream ◽

Buffer Capacity ◽

Comprehensive Evaluation ◽

Estimation Algorithm ◽

Quantile Estimation ◽

Motion Features ◽

Synthetic Datasets ◽

High Level ◽

Stochastic Properties

The need to detect outliers or otherwise unusual data, which can be formalized as the estimation a particular quantile of a distribution, is an important problem that frequently arises in a variety of applications of pattern recognition, computer vision and signal processing. For example, our work was most proximally motivated by the practical limitations and requirements of many semi-automatic surveillance analytics systems that detect abnormalities in closed-circuit television (CCTV) footage using statistical models of low-level motion features. In this paper, we specifically address the problem of estimating the running quantile of a data stream with non-stationary stochasticity when the absolute (rather than asymptotic) memory for storing observations is severely limited. We make several major contributions: (i) we derive an important theoretical result that shows that the change in the quantile of a stream is constrained regardless of the stochastic properties of data; (ii) we describe a set of high-level design goals for an effective estimation algorithm that emerge as a consequence of our theoretical findings; (iii) we introduce a novel algorithm that implements the aforementioned design goals by retaining a sample of data values in a manner adaptive to changes in the distribution of data and progressively narrowing down its focus in the periods of quasi-stationary stochasticity; and (iv) we present a comprehensive evaluation of the proposed algorithm and compare it with the existing methods in the literature on both synthetic datasets and three large “real-world” streams acquired in the course of operation of an existing commercial surveillance system. Our results and their detailed analysis convincingly and comprehensively demonstrate that the proposed method is highly successful and vastly outperforms the existing alternatives, especially when the target quantile is high-valued and the available buffer capacity severely limited.

Download Full-text

An Approximate Approach for Maintaining Recent Occurrences of Itemsets in a Sliding Window over Data Streams

Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development ◽

10.4018/978-1-60566-748-5.ch014 ◽

2010 ◽

pp. 308-327

Author(s):

Jia-Ling Koh ◽

Shu-Ning Shin ◽

Yuan-Bin Don

Keyword(s):

Data Streams ◽

Data Stream ◽

Traditional Approach ◽

Experimental Studies ◽

Dynamic Environment ◽

Sliding Window ◽

Fixed Time ◽

Frequent Itemsets ◽

Embedded Knowledge ◽

Data Elements

Recently, the data stream, which is an unbounded sequence of data elements generated at a rapid rate, provides a dynamic environment for collecting data sources. It is likely that the embedded knowledge in a data stream will change quickly as time goes by. Therefore, catching the recent trend of data is an important issue when mining frequent itemsets over data streams. Although the sliding window model proposed a good solution for this problem, the appearing information of patterns within a sliding window has to be maintained completely in the traditional approach. For estimating the approximate supports of patterns within a sliding window, the frequency changing point (FCP) method is proposed for monitoring the recent occurrences of itemsets over a data stream. In addition to a basic design proposed under the assumption that exact one transaction arrives at each time point, the FCP method is extended for maintaining recent patterns over a data stream where a block of various numbers of transactions (including zero or more transactions) is inputted within a fixed time unit. Accordingly, the recently frequent itemsets or representative patterns are discovered from the maintained structure approximately. Experimental studies demonstrate that the proposed algorithms achieve high true positive rates and guarantees no false dismissal to the results yielded. A theoretic analysis is provided for the guarantee. In addition, the authors’ approach outperforms the previously proposed method in terms of reducing the run-time memory usage significantly.

Download Full-text

Data Stream Frequent Closed Item Sets Mining Based on Fast Sliding Window

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.3702 ◽

2011 ◽

Vol 130-134 ◽

pp. 3702-3707

Author(s):

Zhi Hua Chen ◽

Jun Luo

Keyword(s):

Data Streams ◽

Data Stream ◽

Screening Method ◽

Hash Table ◽

Sliding Window ◽

Threshold Value ◽

Space Efficiency ◽

Frequent Item ◽

Support Threshold ◽

Frequent Item Sets

According to the mobility and continuity of the flow of data streams，this paper presents an algorithm called NSWR to mine the frequent item sets from a fast sliding window over data streams and it meets people’s needs of getting the frequent item sets over data that recently arrive. NWSR, using an effective bit-sequence representation of items based on the data stream sliding window, helps to store data; to support different support threshold value inquiry through hash-table-based frequent closed item sets results query method; to offer screening method based on the classification of closed item sets for reducing the number of item sets that need closure judgments, effectively reducing the computational complexity. Experiments show that the algorithm has better time and space efficiency.

Download Full-text

An algorithm for arbitrary–order cumulant tensor calculation in a sliding window of data streams

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2019-0015 ◽

2019 ◽

Vol 29 (1) ◽

pp. 195-206 ◽

Cited By ~ 3

Author(s):

Krzysztof Domino ◽

Piotr Gawron

Keyword(s):

Data Streams ◽

Data Stream ◽

Multivariate Data ◽

Sliding Window ◽

High Order ◽

Distributed Data ◽

Symmetric Tensors ◽

On Line ◽

Non Gaussian

Abstract High-order cumulant tensors carry information about statistics of non-normally distributed multivariate data. In this work we present a new efficient algorithm for calculation of cumulants of arbitrary orders in a sliding window for data streams. We show that this algorithm offers substantial speedups of cumulant updates compared with the current solutions. The proposed algorithm can be used for processing on-line high-frequency multivariate data and can find applications, e.g., in on-line signal filtering and classification of data streams. To present an application of this algorithm, we propose an estimator of non-Gaussianity of a data stream based on the norms of high order cumulant tensors. We show how to detect the transition from Gaussian distributed data to non-Gaussian ones in a data stream. In order to achieve high implementation efficiency of operations on super-symmetric tensors, such as cumulant tensors, we employ a block structure to store and calculate only one hyper-pyramid part of such tensors.

Download Full-text

Random Tree Data Stream Classifier With Sliding Window Estimator And Concept Drift

Bioscience Biotechnology Research Communications ◽

10.21786/bbrc/12.1/25 ◽

2019 ◽

Vol 12 (1) ◽

pp. 219-228

Author(s):

Ebtesam Almalki ◽

Manal Abdullah

Keyword(s):

Data Stream ◽

Concept Drift ◽

Sliding Window ◽

Random Tree ◽

Tree Data

Download Full-text

Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications ◽

10.1016/j.eswa.2013.07.094 ◽

2014 ◽

Vol 41 (2) ◽

pp. 694-708 ◽

Cited By ~ 64

Author(s):

Gangin Lee ◽

Unil Yun ◽

Keun Ho Ryu

Keyword(s):

Data Streams ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Sliding Window ◽

Frequent Pattern ◽

Maximal Frequent Pattern

Download Full-text

Recurring concept memory management in data streams: exploiting data stream concept evolution to improve performance and transparency

Data Mining and Knowledge Discovery ◽

10.1007/s10618-021-00736-w ◽

2021 ◽

Author(s):

Ben Halstead ◽

Yun Sing Koh ◽

Patricia Riddle ◽

Russel Pears ◽

Mykola Pechenizkiy ◽

...

Keyword(s):

Data Streams ◽

Data Stream ◽

Memory Management ◽

Improve Performance ◽

Concept Evolution

Download Full-text

Sliding window top-k dominating query processing over distributed data streams

Distributed and Parallel Databases ◽

10.1007/s10619-015-7187-9 ◽

2015 ◽

Vol 34 (4) ◽

pp. 535-566 ◽

Cited By ~ 6

Author(s):

Daichi Amagata ◽

Takahiro Hara ◽

Shojiro Nishio

Keyword(s):

Query Processing ◽

Data Streams ◽

Sliding Window ◽

Distributed Data ◽

Distributed Data Streams

Download Full-text