A Review of Uncertain Data Stream Clustering Algorithms

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

DDEUDSC: A Dynamic Distance Estimation using Uncertain Data Stream Clustering in mobile wireless sensor networks

Measurement ◽

10.1016/j.measurement.2014.05.040 ◽

2014 ◽

Vol 55 ◽

pp. 423-433 ◽

Cited By ~ 15

Author(s):

Qinghua Luo ◽

Xiaozhen Yan ◽

Junbao Li ◽

Yu Peng

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Data Stream ◽

Uncertain Data ◽

Distance Estimation ◽

Wireless Sensor ◽

Mobile Wireless ◽

Stream Clustering ◽

Data Stream Clustering ◽

Mobile Wireless Sensor

Download Full-text

Concept Drift Detection in Data Stream Clustering and its Application on Weather Data

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 67-85 ◽

Cited By ~ 1

Author(s):

Namitha K. ◽

Santhosh Kumar G.

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Forecasting ◽

Concept Drift ◽

Clustering Algorithms ◽

Weather Data ◽

Stream Clustering ◽

Cluster Evolution ◽

Data Stream Clustering ◽

Concept Drift Detection

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

Download Full-text

Queries for Uncertain Data on Dataspace Based on Effective Clustering Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.1529 ◽

2013 ◽

Vol 380-384 ◽

pp. 1529-1532

Author(s):

Shuang Zhang ◽

Shi Xiong Zhang

Keyword(s):

Data Stream ◽

Clustering Algorithm ◽

Uncertain Data ◽

Effective Strategy ◽

Clustering Method ◽

Probabilistic Data ◽

Stream Clustering ◽

Data Stream Clustering ◽

Strong Cluster ◽

First Time

This paper presents a probabilistic data stream clustering method P-Stream. An effective clustering algorithm called P-Stream for probabilistic data stream is developed in this paper for the first time. For the uncertain tuples in the data stream, the concepts of strong cluster, transitional clusters and weak cluster are proposed in the P-Stream. With these concepts, an effective strategy of choosing candidate cluster is designed, which can find the sound cluster for every continuously arriving data point. In this paper, we systematically defined the dataspace, the uncertain data, and proposed a updated algorithm of queries on uncertain data based on Effective Clustering Algorithm.

Download Full-text

Data Stream Clustering Algorithms: Challenges and Future Directions

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1990.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 3676-3681

Keyword(s):

Data Stream ◽

Weather Forecasting ◽

Clustering Algorithms ◽

Research Area ◽

Streaming Data ◽

Clustering Methods ◽

Stream Clustering ◽

Data Stream Clustering ◽

Active Research ◽

Financial Transactions

In the fast growing world applications are generating data in enormous volumes called data streams. Data stream is imaginably large, continual, rapid flow of information and in data mining the important tool is called clustering, hence data stream clustering (DSC) can be said as active research area. Recent attention of data stream clustering is through the applications that contain large amounts of streaming data. Data stream clustering is used in many areas such as weather forecasting, financial transactions, website analysis, sensor network monitoring, e-business, telephone records and telecommunications. In case of data stream clustering most popularly used heuristic is K-means and other algorithms like K-medoids and the popular BIRCH are developed. The aim of the abstract is to review the developments and trends of data stream clustering methods and analyze typical DSC algorithms proposed in recent years, such as BIRCH, STREAM, DSTREAM and some more algorithms.

Download Full-text

A Survey on Density based Micro-clustering Algorithms for Data Stream Clustering

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i1/0111 ◽

2017 ◽

Vol 7 (1) ◽

pp. 186-190 ◽

Cited By ~ 1

Author(s):

Donia Augustine ◽

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

A Comparative Study on Data Stream Clustering Algorithms

Lecture Notes on Data Engineering and Communications Technologies - Proceeding of the International Conference on Computer Networks, Big Data and IoT (ICCBI - 2018) ◽

10.1007/978-3-030-24643-3_27 ◽

2019 ◽

pp. 219-230

Author(s):

Twinkle Keshvani ◽

Madhu Shukla

Keyword(s):

Comparative Study ◽

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

An evaluation of data stream clustering algorithms

Statistical Analysis and Data Mining The ASA Data Science Journal ◽

10.1002/sam.11380 ◽

2018 ◽

Vol 11 (4) ◽

pp. 167-187 ◽

Cited By ~ 11

Author(s):

Stratos Mansalis ◽

Eirini Ntoutsi ◽

Nikos Pelekis ◽

Yannis Theodoridis

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering ◽

Evaluation Of Data

Download Full-text

Research on data stream clustering algorithms

Artificial Intelligence Review ◽

10.1007/s10462-013-9398-7 ◽

2013 ◽

Vol 43 (4) ◽

pp. 593-600 ◽

Cited By ~ 32

Author(s):

Shifei Ding ◽

Fulin Wu ◽

Jun Qian ◽

Hongjie Jia ◽

Fengxiang Jin

Keyword(s):

Data Stream ◽

Clustering Algorithms ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

Data Stream Clustering Techniques, Applications, and Models: Comparative Analysis and Discussion

Big Data and Cognitive Computing ◽

10.3390/bdcc2040032 ◽

2018 ◽

Vol 2 (4) ◽

pp. 32 ◽

Cited By ~ 11

Author(s):

Umesh Kokate ◽

Arvind Deshpande ◽

Parikshit Mahalle ◽

Pramod Patil

Keyword(s):

Comparative Analysis ◽

Data Streams ◽

Smart Grids ◽

Data Stream ◽

Concept Drift ◽

Clustering Algorithms ◽

Medical Science ◽

Data Set ◽

Stream Clustering ◽

Data Stream Clustering

Data growth in today’s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional data mining, the data set is generally static in nature and available many times for processing and analysis. However, data stream mining has to satisfy constraints related to real-time response, bounded and limited memory, single-pass, and concept-drift detection. The main problem is identifying the hidden pattern and knowledge for understanding the context for identifying trends from continuous data streams. In this paper, various data stream methods and algorithms are reviewed and evaluated on standard synthetic data streams and real-life data streams. Density-micro clustering and density-grid-based clustering algorithms are discussed and comparative analysis in terms of various internal and external clustering evaluation methods is performed. It was observed that a single algorithm cannot satisfy all the performance measures. The performance of these data stream clustering algorithms is domain-specific and requires many parameters for density and noise thresholds.

Download Full-text