Data Stream Clustering Algorithms: Challenges and Future Directions

In the fast growing world applications are generating data in enormous volumes called data streams. Data stream is imaginably large, continual, rapid flow of information and in data mining the important tool is called clustering, hence data stream clustering (DSC) can be said as active research area. Recent attention of data stream clustering is through the applications that contain large amounts of streaming data. Data stream clustering is used in many areas such as weather forecasting, financial transactions, website analysis, sensor network monitoring, e-business, telephone records and telecommunications. In case of data stream clustering most popularly used heuristic is K-means and other algorithms like K-medoids and the popular BIRCH are developed. The aim of the abstract is to review the developments and trends of data stream clustering methods and analyze typical DSC algorithms proposed in recent years, such as BIRCH, STREAM, DSTREAM and some more algorithms.

Download Full-text

EvolveCluster: an evolutionary clustering algorithm for streaming data

Evolving Systems ◽

10.1007/s12530-021-09408-y ◽

2021 ◽

Author(s):

Christian Nordahl ◽

Veselka Boeva ◽

Håkan Grahn ◽

Marie Persson Netz

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Streaming Data ◽

Evolutionary Clustering ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Evolving Data

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

Concept Drift Detection in Data Stream Clustering and its Application on Weather Data

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 67-85 ◽

Cited By ~ 1

Author(s):

Namitha K. ◽

Santhosh Kumar G.

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Forecasting ◽

Concept Drift ◽

Clustering Algorithms ◽

Weather Data ◽

Stream Clustering ◽

Cluster Evolution ◽

Data Stream Clustering ◽

Concept Drift Detection

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

Download Full-text

IMPROVED DENSITY BASED ALGORITHM FOR DATA STREAM CLUSTERING

Jurnal Teknologi ◽

10.11113/jt.v77.6492 ◽

2015 ◽

Vol 77 (18) ◽

Cited By ~ 2

Author(s):

Maryam Mousavi ◽

Azuraliza Abu Bakar

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Local Density ◽

Clustering Methods ◽

Clustering Techniques ◽

Stream Clustering ◽

Density Based Clustering ◽

Clustering Quality ◽

Data Stream Clustering

In recent years, clustering methods have attracted more attention in analysing and monitoring data streams. Density-based techniques are the remarkable category of clustering techniques that are able to detect the clusters with arbitrary shapes and noises. However, finding the clusters with local density varieties is a difficult task. For handling this problem, in this paper, a new density-based clustering algorithm for data streams is proposed. This algorithm can improve the offline phase of density-based algorithm based on MinPts parameter. The experimental results show that the proposed technique can improve the clustering quality in data streams with different densities.

Download Full-text

A Review of Uncertain Data Stream Clustering Algorithms

2015 Eighth International Conference on Internet Computing for Science and Engineering (ICICSE) ◽

10.1109/icicse.2015.30 ◽

2015 ◽

Cited By ~ 2

Author(s):

Yue Yang ◽

Zhuo Liu ◽

Zhidan Xing

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Uncertain Data ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

A Survey on Density based Micro-clustering Algorithms for Data Stream Clustering

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i1/0111 ◽

2017 ◽

Vol 7 (1) ◽

pp. 186-190 ◽

Cited By ~ 1

Author(s):

Donia Augustine ◽

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

A Comparative Study on Data Stream Clustering Algorithms

Lecture Notes on Data Engineering and Communications Technologies - Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2018) ◽

10.1007/978-3-030-24643-3_27 ◽

2019 ◽

pp. 219-230

Author(s):

Twinkle Keshvani ◽

Madhu Shukla

Keyword(s):

Comparative Study ◽

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

An evaluation of data stream clustering algorithms

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11380 ◽

2018 ◽

Vol 11 (4) ◽

pp. 167-187 ◽

Cited By ~ 11

Author(s):

Stratos Mansalis ◽

Eirini Ntoutsi ◽

Nikos Pelekis ◽

Yannis Theodoridis

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering ◽

Evaluation Of Data

Download Full-text

Research on data stream clustering algorithms

Artificial Intelligence Review ◽

10.1007/s10462-013-9398-7 ◽

2013 ◽

Vol 43 (4) ◽

pp. 593-600 ◽

Cited By ~ 32

Author(s):

Shifei Ding ◽

Fulin Wu ◽

Jun Qian ◽

Hongjie Jia ◽

Fengxiang Jin

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

Constraint-based discriminative dimension selection for high-dimensional stream clustering

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v4i3.271 ◽

2018 ◽

Vol 4 (3) ◽

pp. 167

Author(s):

Kitsana Waiyamai ◽

Thanapat Kangkachit

Keyword(s):

Data Streams ◽

Clustering Algorithm ◽

Expert Knowledge ◽

Clustering Algorithms ◽

Clustering Methods ◽

Dynamic Constraints ◽

Stream Clustering ◽

Clustering Quality ◽

Active Research ◽

Clustering Data

Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with large number of dimensions. In order to reduce the clustering complexity, one possible solution consists in determining the appropriate subset of cluster dimensions via dimension projection. SED-Stream is an efficient clustering algorithm that supports high dimension data streams. The aim of this paper is to increase performance of SED-Stream in terms of both clustering quality and execution-time. In order to improve the clustering process, background or domain expert knowledge are integrated as “constraints” in SEDC-Stream. The new algorithm, SEDC-Stream, supports the evolving characteristics of the dynamic constraints which are activation, fading, outdating and prioritization. SEDC-Stream algorithm is able to reduce cluster splitting time, and place new incoming points to their suitable clusters. Compared to SED-Stream on the three real-world streams datasets, SEDC-Stream is able to generate a better clustering performance in terms of both purity and f-measure.

Download Full-text

Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion

Big Data and Cognitive Computing ◽

10.3390/bdcc2040032 ◽

2018 ◽

Vol 2 (4) ◽

pp. 32 ◽

Cited By ~ 11

Author(s):

Umesh Kokate ◽

Arvind Deshpande ◽

Parikshit Mahalle ◽

Pramod Patil

Keyword(s):

Comparative Analysis ◽

Data Streams ◽

Smart Grids ◽

Data Stream ◽

Concept Drift ◽

Clustering Algorithms ◽

Medical Science ◽

Data Set ◽

Stream Clustering ◽

Data Stream Clustering

Data growth in today’s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional data mining, the data set is generally static in nature and available many times for processing and analysis. However, data stream mining has to satisfy constraints related to real-time response, bounded and limited memory, single-pass, and concept-drift detection. The main problem is identifying the hidden pattern and knowledge for understanding the context for identifying trends from continuous data streams. In this paper, various data stream methods and algorithms are reviewed and evaluated on standard synthetic data streams and real-life data streams. Density-micro clustering and density-grid-based clustering algorithms are discussed and comparative analysis in terms of various internal and external clustering evaluation methods is performed. It was observed that a single algorithm cannot satisfy all the performance measures. The performance of these data stream clustering algorithms is domain-specific and requires many parameters for density and noise thresholds.

Download Full-text