Streaming Data and Data Streams

Online mining of frequent closed itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we proposed a novel sliding window based algorithm. The algorithm exploits lattice properties to limit the search to frequent close itemsets which share at least one item with the new transaction. Experiments results on synthetic datasets show that our proposed algorithm is both time and space efficient.

Download Full-text

A Survey of Challenges Facing Streaming Data

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.84.8579 ◽

2020 ◽

Vol 8 (4) ◽

pp. 63-73

Author(s):

Sikha Bagui ◽

Katie Jin

Keyword(s):

Data Reduction ◽

Data Streams ◽

Data Stream ◽

Stream Processing ◽

Streaming Data ◽

Data Detection ◽

Data Stream Processing ◽

The Face ◽

Concept Drifts

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.

Download Full-text

EvolveCluster: an evolutionary clustering algorithm for streaming data

Evolving Systems ◽

10.1007/s12530-021-09408-y ◽

2021 ◽

Author(s):

Christian Nordahl ◽

Veselka Boeva ◽

Håkan Grahn ◽

Marie Persson Netz

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Streaming Data ◽

Evolutionary Clustering ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Evolving Data

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

Online Learning from Capricious Data Streams: A Generative Approach

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/346 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yi He ◽

Baijun Wu ◽

Di Wu ◽

Ege Beyazit ◽

Sheng Chen ◽

...

Keyword(s):

Online Learning ◽

Data Streams ◽

Graphical Model ◽

Feature Space ◽

Streaming Data ◽

The Past ◽

Universal Feature ◽

New Feature ◽

Space Dynamics ◽

Generative Approach

Learning with streaming data has received extensive attention during the past few years. Existing approaches assume the feature space is fixed or changes by following explicit regularities, limiting their applicability in dynamic environments where the data streams are described by an arbitrarily varying feature space. To handle such capricious data streams, we in this paper develop a novel algorithm, named OCDS (Online learning from Capricious Data Streams), which does not make any assumption on feature space dynamics. OCDS trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. Specifically, the universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve performance with theoretical analysis. The experimental results demonstrate that OCDS achieves conspicuous performance on synthetic and real datasets.

Download Full-text

TADILOF: Time Aware Density-Based Incremental Local Outlier Detection in Data Streams

Sensors ◽

10.3390/s20205829 ◽

2020 ◽

Vol 20 (20) ◽

pp. 5829 ◽

Cited By ~ 1

Author(s):

Jen-Wei Huang ◽

Meng-Xun Zhong ◽

Bijay Prasad Jaysawal

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

State Of The Art ◽

Streaming Data ◽

Current State ◽

Data Points ◽

Local Outlier ◽

Time Aware ◽

Over Time

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.

Download Full-text

Efficient Mining of Data Streams Using Associative Classification Approach

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194015500059 ◽

2015 ◽

Vol 25 (03) ◽

pp. 605-631 ◽

Cited By ~ 6

Author(s):

Prasanna Lakshmi Kompalli ◽

Ramesh Kumar Cherku

Keyword(s):

Data Streams ◽

Processing Time ◽

Real Data ◽

Streaming Data ◽

Infinite Length ◽

Associative Classification ◽

Streaming Algorithm ◽

Scan Data ◽

Synthetic Datasets ◽

And Performance

Data stream associative classification poses many challenges to the data mining community. In this paper, we address four major challenges posed, namely, infinite length, extraction of knowledge with single scan, processing time, and accuracy. Since data streams are infinite in length, it is impractical to store and use all the historical data for training. Mining such streaming data for knowledge acquisition is a unique opportunity and even a tough task. A streaming algorithm must scan data once and extract knowledge. While mining data streams, processing time, and accuracy have become two important aspects. In this paper, we propose PSTMiner which considers the nature of data streams and provides an efficient classifier for predicting the class label of real data streams. It has greater potential when compared with many existing classification techniques. Additionally, we propose a compact novel tree structure called PSTree (Prefix Streaming Tree) for storing data. Extensive experiments conducted on 24 real datasets from UCI repository and synthetic datasets from MOA (Massive Online Analysis) show that PSTMiner is consistent. Empirical results show that performance of PSTMiner is highly competitive in terms of accuracy and performance time when compared with other approaches under windowed streaming model.

Download Full-text

TOWARDS THE DEVELOPMENT OF A TAXONOMY FOR VISUALISATION OF STREAMED GEOSPATIAL DATA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-iii-2-129-2016 ◽

2016 ◽

Vol III-2 ◽

pp. 129-136 ◽

Cited By ~ 1

Author(s):

B. H. Sibolla ◽

T. Van Zyl ◽

S. Coetzee

Keyword(s):

Open Source ◽

Data Streams ◽

Information Model ◽

Streaming Data ◽

Geospatial Data ◽

Human Interaction ◽

Two Dimensional ◽

Design Phase ◽

Data Visualisation ◽

Library Development

Geospatial data has very specific characteristics that need to be carefully captured in its visualisation, in order for the user and the viewer to gain knowledge from it. The science of visualisation has gained much traction over the last decade as a response to various visualisation challenges. During the development of an open source based, dynamic two-dimensional visualisation library, that caters for geospatial streaming data, it was found necessary to conduct a review of existing geospatial visualisation taxonomies. The review was done in order to inform the design phase of the library development, such that either an existing taxonomy can be adopted or extended to fit the needs at hand. The major challenge in this case is to develop dynamic two dimensional visualisations that enable human interaction in order to assist the user to understand the data streams that are continuously being updated. This paper reviews the existing geospatial data visualisation taxonomies that have been developed over the years. Based on the review, an adopted taxonomy for visualisation of geospatial streaming data is presented. Example applications of this taxonomy are also provided. The adopted taxonomy will then be used to develop the information model for the visualisation library in a further study.

Download Full-text

Anomaly Pattern Detection in Streaming Data Based on the Transformation to Multiple Binary-Valued Data Streams

Journal of Artificial Intelligence and Soft Computing Research ◽

10.2478/jaiscr-2022-0002 ◽

2021 ◽

Vol 12 (1) ◽

pp. 19-27

Author(s):

Taegong Kim ◽

Cheong Hee Park

Keyword(s):

Outlier Detection ◽

Data Streams ◽

Data Stream ◽

Detection Method ◽

Binary Classification ◽

Streaming Data ◽

Pattern Detection ◽

Detection Methods ◽

Anomaly Pattern ◽

Isolation Forest

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.

Download Full-text

TIFIM: Tree based Incremental Frequent Itemset Mining over Streaming Data

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v10i5.4149 ◽

2013 ◽

Vol 10 (5) ◽

pp. 1580-1586

Author(s):

V.sidda Reddy ◽

Dr T.V. Rao ◽

Dr A. Govardhan

Keyword(s):

Data Streams ◽

Data Stream ◽

Streaming Data ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Proposed Model ◽

Mining Model ◽

Mining Algorithms ◽

Memory Efficient

Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.

Download Full-text

Retaining Data from Streams of Social Platforms with Minimal Regret

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/397 ◽

2017 ◽

Author(s):

Nguyen Thanh Tam ◽

Matthias Weidlich ◽

Duong Chi Thang ◽

Hongzhi Yin ◽

Nguyen Quoc Viet Hung

Keyword(s):

Social Media ◽

Data Streams ◽

Large Scale ◽

Information Quality ◽

Streaming Data ◽

Dynamic Nature ◽

Permanent Storage ◽

Reasonable Limit ◽

Efficient Processing ◽

Real World Datasets

Today's social platforms, such as Twitter and Facebook, continuously generate massive volumes of data. The resulting data streams exceed any reasonable limit for permanent storage, especially since data is often redundant, overlapping, sparse, and generally of low value. This calls for means to retain solely a small fraction of the data in an online manner. In this paper, we propose techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized. These techniques enable not only efficient processing of massive streaming data, but are also adaptive and address the dynamic nature of social media. Experiments on large-scale real-world datasets illustrate the feasibility of our approach in terms of both, runtime and information quality.

Download Full-text