Streaming Data and Data Streams

Author(s):  
Taiwo Kolajo ◽  
Olawande Daramola ◽  
Ayodele Adebiyi
Keyword(s):  
2012 ◽  
Vol 256-259 ◽  
pp. 2910-2913
Author(s):  
Jun Tan

Online mining of frequent closed itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we proposed a novel sliding window based algorithm. The algorithm exploits lattice properties to limit the search to frequent close itemsets which share at least one item with the new transaction. Experiments results on synthetic datasets show that our proposed algorithm is both time and space efficient.


2020 ◽  
Vol 8 (4) ◽  
pp. 63-73
Author(s):  
Sikha Bagui ◽  
Katie Jin

This survey performs a thorough enumeration and analysis of existing methods for data stream processing. It is a survey of the challenges facing streaming data. The challenges addressed are preprocessing of streaming data, detection and dealing with concept drifts in streaming data, data reduction in the face of data streams, approximate queries and blocking operations in streaming data.


2021 ◽  
Author(s):  
Christian Nordahl ◽  
Veselka Boeva ◽  
Håkan Grahn ◽  
Marie Persson Netz

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.


Author(s):  
Yi He ◽  
Baijun Wu ◽  
Di Wu ◽  
Ege Beyazit ◽  
Sheng Chen ◽  
...  

Learning with streaming data has received extensive attention during the past few years. Existing approaches assume the feature space is fixed or changes by following explicit regularities, limiting their applicability in dynamic environments where the data streams are described by an arbitrarily varying feature space. To handle such capricious data streams, we in this paper develop a novel algorithm, named OCDS (Online learning from Capricious Data Streams), which does not make any assumption on feature space dynamics. OCDS trains a learner on a universal feature space that establishes relationships between old and new features, so that the patterns learned in the old feature space can be used in the new feature space. Specifically, the universal feature space is constructed by leveraging the relatednesses among features. We propose a generative graphical model to model the construction process, and show that learning from the universal feature space can effectively improve performance with theoretical analysis. The experimental results demonstrate that OCDS achieves conspicuous performance on synthetic and real datasets.


Sensors ◽  
2020 ◽  
Vol 20 (20) ◽  
pp. 5829 ◽  
Author(s):  
Jen-Wei Huang ◽  
Meng-Xun Zhong ◽  
Bijay Prasad Jaysawal

Outlier detection in data streams is crucial to successful data mining. However, this task is made increasingly difficult by the enormous growth in the quantity of data generated by the expansion of Internet of Things (IoT). Recent advances in outlier detection based on the density-based local outlier factor (LOF) algorithms do not consider variations in data that change over time. For example, there may appear a new cluster of data points over time in the data stream. Therefore, we present a novel algorithm for streaming data, referred to as time-aware density-based incremental local outlier detection (TADILOF) to overcome this issue. In addition, we have developed a means for estimating the LOF score, termed "approximate LOF," based on historical information following the removal of outdated data. The results of experiments demonstrate that TADILOF outperforms current state-of-the-art methods in terms of AUC while achieving similar performance in terms of execution time. Moreover, we present an application of the proposed scheme to the development of an air-quality monitoring system.


Author(s):  
Prasanna Lakshmi Kompalli ◽  
Ramesh Kumar Cherku

Data stream associative classification poses many challenges to the data mining community. In this paper, we address four major challenges posed, namely, infinite length, extraction of knowledge with single scan, processing time, and accuracy. Since data streams are infinite in length, it is impractical to store and use all the historical data for training. Mining such streaming data for knowledge acquisition is a unique opportunity and even a tough task. A streaming algorithm must scan data once and extract knowledge. While mining data streams, processing time, and accuracy have become two important aspects. In this paper, we propose PSTMiner which considers the nature of data streams and provides an efficient classifier for predicting the class label of real data streams. It has greater potential when compared with many existing classification techniques. Additionally, we propose a compact novel tree structure called PSTree (Prefix Streaming Tree) for storing data. Extensive experiments conducted on 24 real datasets from UCI repository and synthetic datasets from MOA (Massive Online Analysis) show that PSTMiner is consistent. Empirical results show that performance of PSTMiner is highly competitive in terms of accuracy and performance time when compared with other approaches under windowed streaming model.


Author(s):  
B. H. Sibolla ◽  
T. Van Zyl ◽  
S. Coetzee

Geospatial data has very specific characteristics that need to be carefully captured in its visualisation, in order for the user and the viewer to gain knowledge from it. The science of visualisation has gained much traction over the last decade as a response to various visualisation challenges. During the development of an open source based, dynamic two-dimensional visualisation library, that caters for geospatial streaming data, it was found necessary to conduct a review of existing geospatial visualisation taxonomies. The review was done in order to inform the design phase of the library development, such that either an existing taxonomy can be adopted or extended to fit the needs at hand. The major challenge in this case is to develop dynamic two dimensional visualisations that enable human interaction in order to assist the user to understand the data streams that are continuously being updated. This paper reviews the existing geospatial data visualisation taxonomies that have been developed over the years. Based on the review, an adopted taxonomy for visualisation of geospatial streaming data is presented. Example applications of this taxonomy are also provided. The adopted taxonomy will then be used to develop the information model for the visualisation library in a further study.


Author(s):  
Taegong Kim ◽  
Cheong Hee Park

Abstract Anomaly pattern detection in a data stream aims to detect a time point where outliers begin to occur abnormally. Recently, a method for anomaly pattern detection has been proposed based on binary classification for outliers and statistical tests in the data stream of binary labels of normal or an outlier. It showed that an anomaly pattern can be detected accurately even when outlier detection performance is relatively low. However, since the anomaly pattern detection method is based on the binary classification for outliers, most well-known outlier detection methods, with the output of real-valued outlier scores, can not be used directly. In this paper, we propose an anomaly pattern detection method in a data stream using the transformation to multiple binary-valued data streams from real-valued outlier scores. By using three outlier detection methods, Isolation Forest(IF), Autoencoder-based outlier detection, and Local outlier factor(LOF), the proposed anomaly pattern detection method is tested using artificial and real data sets. The experimental results show that anomaly pattern detection using Isolation Forest gives the best performance.


2013 ◽  
Vol 10 (5) ◽  
pp. 1580-1586
Author(s):  
V.sidda Reddy ◽  
Dr T.V. Rao ◽  
Dr A. Govardhan

Data Stream Mining algorithms performs under constraints called space used and time taken, which is due to the streaming property. The relaxation in these constraints is inversely proportional to the streaming speed of the data. Since the caching and mining the streaming-data is sensitive, here in this paper a scalable, memory efficient caching and frequent itemset mining model is devised. The proposed model is an incremental approach that builds single level multi node trees called bushes from each window of the streaming data; henceforth we refer this proposed algorithm as a Tree (bush) based Incremental Frequent Itemset Mining (TIFIM) over data streams.


Author(s):  
Nguyen Thanh Tam ◽  
Matthias Weidlich ◽  
Duong Chi Thang ◽  
Hongzhi Yin ◽  
Nguyen Quoc Viet Hung

Today's social platforms, such as Twitter and Facebook, continuously generate massive volumes of data. The resulting data streams exceed any reasonable limit for permanent storage, especially since data is often redundant, overlapping, sparse, and generally of low value. This calls for means to retain solely a small fraction of the data in an online manner. In this paper, we propose techniques to effectively decide which data to retain, such that the induced loss of information, the regret of neglecting certain data, is minimized. These techniques enable not only efficient processing of massive streaming data, but are also adaptive and address the dynamic nature of social media. Experiments on large-scale real-world datasets illustrate the feasibility of our approach in terms of both, runtime and information quality.


Sign in / Sign up

Export Citation Format

Share Document