Hierarchical clustering for multiple nominal data streams with evolving behaviour

Complex & Intelligent Systems ◽

10.1007/s40747-021-00634-0 ◽

2022 ◽

Author(s):

Jerry W. Sangma ◽

Mekhla Sarkar ◽

Vipin Pal ◽

Amit Agrawal ◽

Yogita

Keyword(s):

Hierarchical Clustering ◽

Data Streams ◽

Data Stream ◽

Research Work ◽

Entropy Measure ◽

Cophenetic Correlation ◽

Clustering Technique ◽

Multiple Data ◽

Variable Approach ◽

Multiple Data Streams

AbstractOver the decade, a number of attempts have been made towards data stream clustering, but most of the works fall under clustering by example approach. There are a number of applications where clustering by variable approach is required which involves clustering of multiple data streams as opposed to clustering data examples in a data stream. Furthermore, a few works have been presented for clustering multiple data streams and these are applicable to numeric data streams only. Hence, this research gap has motivated current research work. In the present work, a hierarchical clustering technique has been proposed to cluster multiple data streams where data are nominal. To address the concept changes in the data streams splitting and merging of the clusters in the hierarchical structure are performed. The decision to split or merge is based on the entropy measure, representing the cluster’s degree of disparity. The performance of the proposed technique has been analysed and compared to Agglomerative Nesting clustering technique on synthetic as well as a real-world dataset in terms of Dunn Index, Modified Hubert $$\varGamma $$ Γ statistic, Cophenetic Correlation Coefficient, and Purity. The proposed technique outperforms Agglomerative Nesting clustering technique for concept evolving data streams. Furthermore, the effect of concept evolution on clustering structure and average entropy has been visualised for detailed analysis and understanding.

Download Full-text

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011100101 ◽

2011 ◽

Vol 7 (4) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Reem Al-Mulla ◽

Zaher Al Aghbari

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithms ◽

Arrival Rate ◽

Time Algorithm ◽

Incremental Algorithm ◽

Multiple Data ◽

Multiple Data Streams ◽

Mining Data Streams ◽

New Applications

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.

Download Full-text

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch012 ◽

2013 ◽

pp. 259-279

Author(s):

Reem Al-Mulla ◽

Zaher Al Aghbari

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithms ◽

Arrival Rate ◽

Time Algorithm ◽

Incremental Algorithm ◽

Multiple Data ◽

Multiple Data Streams ◽

Mining Data Streams ◽

New Applications

Download Full-text

Consistent assimilation of multiple data streams in a carbon cycle data assimilation system

Geoscientific Model Development ◽

10.5194/gmd-9-3569-2016 ◽

2016 ◽

Vol 9 (10) ◽

pp. 3569-3588 ◽

Cited By ~ 28

Author(s):

Natasha MacBean ◽

Philippe Peylin ◽

Frédéric Chevallier ◽

Marko Scholze ◽

Gregor Schürmann

Keyword(s):

Data Assimilation ◽

Carbon Cycle ◽

Data Streams ◽

Land Surface ◽

Data Stream ◽

Correlated Errors ◽

Multiple Data ◽

Model Dynamics ◽

Error Distributions ◽

Multiple Data Streams

Abstract. Data assimilation methods provide a rigorous statistical framework for constraining parametric uncertainty in land surface models (LSMs), which in turn helps to improve their predictive capability and to identify areas in which the representation of physical processes is inadequate. The increase in the number of available datasets in recent years allows us to address different aspects of the model at a variety of spatial and temporal scales. However, combining data streams in a DA system is not a trivial task. In this study we highlight some of the challenges surrounding multiple data stream assimilation for the carbon cycle component of LSMs. We give particular consideration to the assumptions associated with the type of inversion algorithm that are typically used when optimising global LSMs – namely, Gaussian error distributions and linearity in the model dynamics. We explore the effect of biases and inconsistencies between the observations and the model (resulting in non-Gaussian error distributions), and we examine the difference between a simultaneous assimilation (in which all data streams are included in one optimisation) and a step-wise approach (in which each data stream is assimilated sequentially) in the presence of non-linear model dynamics. In addition, we perform a preliminary investigation into the impact of correlated errors between two data streams for two cases, both when the correlated observation errors are included in the prior observation error covariance matrix, and when the correlated errors are ignored. We demonstrate these challenges by assimilating synthetic observations into two simple models: the first a simplified version of the carbon cycle processes represented in many LSMs and the second a non-linear toy model. Finally, we provide some perspectives and advice to other land surface modellers wishing to use multiple data streams to constrain their model parameters.

Download Full-text

Research issues in mining multiple data streams

Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques - StreamKDD '10 ◽

10.1145/1833280.1833288 ◽

2010 ◽

Cited By ~ 11

Author(s):

Wenyan Wu ◽

Le Gruenwald

Keyword(s):

Data Streams ◽

Research Issues ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

Comparing multiple data streams to assess free-floating carsharing use

Transportation Research Procedia ◽

10.1016/j.trpro.2018.10.011 ◽

2018 ◽

Vol 32 ◽

pp. 617-626 ◽

Cited By ~ 1

Author(s):

Grzegorz Wielinski ◽

Martin Trépanier ◽

Catherine Morency ◽

Khandker Nurul Habib

Keyword(s):

Data Streams ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences ◽

10.1016/j.ins.2011.09.004 ◽

2012 ◽

Vol 183 (1) ◽

pp. 35-47 ◽

Cited By ~ 33

Author(s):

Ling Chen ◽

Ling-Jun Zou ◽

Li Tu

Keyword(s):

Data Streams ◽

Clustering Algorithm ◽

Spectral Component ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

Energy Efficient Transmissions in Cognitive MIMO Systems With Multiple Data Streams

IEEE Transactions on Wireless Communications ◽

10.1109/twc.2015.2434372 ◽

2015 ◽

Vol 14 (9) ◽

pp. 5171-5184 ◽

Cited By ~ 8

Author(s):

Liqun Fu ◽

Mikael Johansson ◽

Mats Bengtsson

Keyword(s):

Data Streams ◽

Energy Efficient ◽

Mimo Systems ◽

Multiple Data ◽

Multiple Data Streams

Download Full-text

Clustering Algorithm for Multiple Data Streams Based on Data Cloud Node

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.462-463.247 ◽

2013 ◽

Vol 462-463 ◽

pp. 247-250

Author(s):

Sa Li ◽

Liang Shan Shao

Keyword(s):

Data Streams ◽

Minimum Distance ◽

Clustering Algorithm ◽

Cloud Model ◽

Data Sequence ◽

Multiple Data ◽

Multiple Data Streams ◽

Model Algorithm

Multiple data streams clustering aims to clustering multiple data streams according to their similarity while tracking their changes with time . This paper proposes M_SCCStream algorithm based on cloud model. Algorithm introduces data cloud node structure with hierarchical characteristics to represent different granularity data sequence and takes the entropy indicated the degree of data changes. Algorithm finds micro_clustering with the minimum distance and then obtains the clustering result of multiple data streams through calculating the correlation degrees of micro_clustering. The experiment proves that the algorithm has higher quality and stability.

Download Full-text

Design and Implementation of an Ultrasonic Link for Concurrent Telemetry of Multiple Data Streams to Implantable Biomedical Microsystems

2018 IEEE 61st International Midwest Symposium on Circuits and Systems (MWSCAS) ◽

10.1109/mwscas.2018.8624051 ◽

2018 ◽

Author(s):

Keivan Keramatzadeh ◽

Amir M. Sodagar

Keyword(s):

Data Streams ◽

Design And Implementation ◽

Multiple Data ◽

Multiple Data Streams ◽

Biomedical Microsystems

Download Full-text

Monitoring forest cover loss using multiple data streams, a case study of a tropical dry forest in Bolivia

ISPRS Journal of Photogrammetry and Remote Sensing ◽

10.1016/j.isprsjprs.2015.03.015 ◽

2015 ◽

Vol 107 ◽

pp. 112-125 ◽

Cited By ~ 57

Author(s):

Loïc Paul Dutrieux ◽

Jan Verbesselt ◽

Lammert Kooistra ◽

Martin Herold

Keyword(s):

Data Streams ◽

Forest Cover ◽

Tropical Dry Forest ◽

Dry Forest ◽

Multiple Data ◽

Multiple Data Streams ◽

Forest Cover Loss

Download Full-text