Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.

Download Full-text

Hierarchical clustering for multiple nominal data streams with evolving behaviour

Complex & Intelligent Systems ◽

10.1007/s40747-021-00634-0 ◽

2022 ◽

Author(s):

Jerry W. Sangma ◽

Mekhla Sarkar ◽

Vipin Pal ◽

Amit Agrawal ◽

Yogita

Keyword(s):

Hierarchical Clustering ◽

Data Streams ◽

Data Stream ◽

Research Work ◽

Entropy Measure ◽

Cophenetic Correlation ◽

Clustering Technique ◽

Multiple Data ◽

Variable Approach ◽

Multiple Data Streams

AbstractOver the decade, a number of attempts have been made towards data stream clustering, but most of the works fall under clustering by example approach. There are a number of applications where clustering by variable approach is required which involves clustering of multiple data streams as opposed to clustering data examples in a data stream. Furthermore, a few works have been presented for clustering multiple data streams and these are applicable to numeric data streams only. Hence, this research gap has motivated current research work. In the present work, a hierarchical clustering technique has been proposed to cluster multiple data streams where data are nominal. To address the concept changes in the data streams splitting and merging of the clusters in the hierarchical structure are performed. The decision to split or merge is based on the entropy measure, representing the cluster’s degree of disparity. The performance of the proposed technique has been analysed and compared to Agglomerative Nesting clustering technique on synthetic as well as a real-world dataset in terms of Dunn Index, Modified Hubert $$\varGamma $$ Γ statistic, Cophenetic Correlation Coefficient, and Purity. The proposed technique outperforms Agglomerative Nesting clustering technique for concept evolving data streams. Furthermore, the effect of concept evolution on clustering structure and average entropy has been visualised for detailed analysis and understanding.

Download Full-text

Consistent assimilation of multiple data streams in a carbon cycle data assimilation system

Geoscientific Model Development ◽

10.5194/gmd-9-3569-2016 ◽

2016 ◽

Vol 9 (10) ◽

pp. 3569-3588 ◽

Cited By ~ 28

Author(s):

Natasha MacBean ◽

Philippe Peylin ◽

Frédéric Chevallier ◽

Marko Scholze ◽

Gregor Schürmann

Keyword(s):

Data Assimilation ◽

Carbon Cycle ◽

Data Streams ◽

Land Surface ◽

Data Stream ◽

Correlated Errors ◽

Multiple Data ◽

Model Dynamics ◽

Error Distributions ◽

Multiple Data Streams

Abstract. Data assimilation methods provide a rigorous statistical framework for constraining parametric uncertainty in land surface models (LSMs), which in turn helps to improve their predictive capability and to identify areas in which the representation of physical processes is inadequate. The increase in the number of available datasets in recent years allows us to address different aspects of the model at a variety of spatial and temporal scales. However, combining data streams in a DA system is not a trivial task. In this study we highlight some of the challenges surrounding multiple data stream assimilation for the carbon cycle component of LSMs. We give particular consideration to the assumptions associated with the type of inversion algorithm that are typically used when optimising global LSMs – namely, Gaussian error distributions and linearity in the model dynamics. We explore the effect of biases and inconsistencies between the observations and the model (resulting in non-Gaussian error distributions), and we examine the difference between a simultaneous assimilation (in which all data streams are included in one optimisation) and a step-wise approach (in which each data stream is assimilated sequentially) in the presence of non-linear model dynamics. In addition, we perform a preliminary investigation into the impact of correlated errors between two data streams for two cases, both when the correlated observation errors are included in the prior observation error covariance matrix, and when the correlated errors are ignored. We demonstrate these challenges by assimilating synthetic observations into two simple models: the first a simplified version of the carbon cycle processes represented in many LSMs and the second a non-linear toy model. Finally, we provide some perspectives and advice to other land surface modellers wishing to use multiple data streams to constrain their model parameters.

Download Full-text

Empowering Density-based Micro-clusters In Dynamic Data Stream Clustering

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207147 ◽

2020 ◽

pp. 259-259

Author(s):

Asha P. V. ◽

Anju M. Sukumar

Keyword(s):

Data Streams ◽

Data Clustering ◽

Data Stream ◽

Clustering Algorithms ◽

Streaming Data ◽

Data Stream Mining ◽

Stream Mining ◽

Fast Processing ◽

Geospatial Services ◽

Mining Data Streams

Data stream is a continuous sequence of data generated from various sources and continuously transferred from source to target. Streaming data needs to be processed without having access to all of the data. Some of the sources generating data streams are social networks, geospatial services, weather monitoring, e-commerce purchases, etc. Data stream mining is the process of acquiring knowledge structures from the continuously arriving data. Clustering is an unsupervised machine learning technique that can be used to extract knowledge patterns from the data stream. The mining of streaming data is challenging because the data is in huge amounts and arriving continuously. So the traditional algorithms are not suitable for mining data streams. Data stream mining requires fast processing algorithms using a single scan and a limited amount of memory. The micro clustering has a good role in this. In itself, density based micro clustering has its own unique place in data stream mining. This paper presents a survey on different data clustering algorithms, realizes and empowers the use of density-based micro clusters.

Download Full-text

Research issues in mining multiple data streams

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques - StreamKDD '10 ◽

10.1145/1833280.1833288 ◽

2010 ◽

Cited By ~ 11

Author(s):

Wenyan Wu ◽

Le Gruenwald

Keyword(s):

Data Streams ◽

Research Issues ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

Comparing multiple data streams to assess free-floating carsharing use

Transportation Research Procedia ◽

10.1016/j.trpro.2018.10.011 ◽

2018 ◽

Vol 32 ◽

pp. 617-626 ◽

Cited By ~ 1

Author(s):

Grzegorz Wielinski ◽

Martin Trépanier ◽

Catherine Morency ◽

Khandker Nurul Habib

Keyword(s):

Data Streams ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences ◽

10.1016/j.ins.2011.09.004 ◽

2012 ◽

Vol 183 (1) ◽

pp. 35-47 ◽

Cited By ~ 33

Author(s):

Ling Chen ◽

Ling-Jun Zou ◽

Li Tu

Keyword(s):

Data Streams ◽

Clustering Algorithm ◽

Spectral Component ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

Energy Efficient Transmissions in Cognitive MIMO Systems With Multiple Data Streams

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2015.2434372 ◽

2015 ◽

Vol 14 (9) ◽

pp. 5171-5184 ◽

Cited By ~ 8

Author(s):

Liqun Fu ◽

Mikael Johansson ◽

Mats Bengtsson

Keyword(s):

Data Streams ◽

Energy Efficient ◽

Mimo Systems ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

Clustering Algorithm for Multiple Data Streams Based on Data Cloud Node

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.247 ◽

2013 ◽

Vol 462-463 ◽

pp. 247-250

Author(s):

Sa Li ◽

Liang Shan Shao

Keyword(s):

Data Streams ◽

Minimum Distance ◽

Clustering Algorithm ◽

Cloud Model ◽

Data Sequence ◽

Multiple Data ◽

Multiple Data Streams ◽

Model Algorithm

Multiple data streams clustering aims to clustering multiple data streams according to their similarity while tracking their changes with time . This paper proposes M_SCCStream algorithm based on cloud model. Algorithm introduces data cloud node structure with hierarchical characteristics to represent different granularity data sequence and takes the entropy indicated the degree of data changes. Algorithm finds micro_clustering with the minimum distance and then obtains the clustering result of multiple data streams through calculating the correlation degrees of micro_clustering. The experiment proves that the algorithm has higher quality and stability.

Download Full-text

Knowledge Discovery From Evolving Data Streams

Advances in Business Information Systems and Analytics - Machine Learning Techniques for Improved Business Analytics ◽

10.4018/978-1-5225-3534-8.ch002 ◽

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text