Survey on Data Streams Clustering Techniques

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.933.768 ◽

2014 ◽

Vol 933 ◽

pp. 768-773 ◽

Cited By ~ 1

Author(s):

Wei Hua Ma

Keyword(s):

Big Data ◽

Data Streams ◽

Data Stream ◽

Research Topic ◽

Sliding Windows ◽

Clustering Techniques ◽

Research Results ◽

Stream Clustering ◽

Data Stream Clustering

Data stream in a popular research topic in big data era. There are many research results on data stream clustering domain. This paper firstly has a brief introduction to data stream methodologies, such as sampling, sliding windows, etc. Finally, it presents a survey on data streams clustering techniques.

Download Full-text

IMPROVED DENSITY BASED ALGORITHM FOR DATA STREAM CLUSTERING

Jurnal Teknologi ◽

10.11113/jt.v77.6492 ◽

2015 ◽

Vol 77 (18) ◽

Cited By ~ 2

Author(s):

Maryam Mousavi ◽

Azuraliza Abu Bakar

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Local Density ◽

Clustering Methods ◽

Clustering Techniques ◽

Stream Clustering ◽

Density Based Clustering ◽

Clustering Quality ◽

Data Stream Clustering

In recent years, clustering methods have attracted more attention in analysing and monitoring data streams. Density-based techniques are the remarkable category of clustering techniques that are able to detect the clusters with arbitrary shapes and noises. However, finding the clusters with local density varieties is a difficult task. For handling this problem, in this paper, a new density-based clustering algorithm for data streams is proposed. This algorithm can improve the offline phase of density-based algorithm based on MinPts parameter. The experimental results show that the proposed technique can improve the clustering quality in data streams with different densities.

Download Full-text

Clustering Large Datasets Using Data Stream Clustering Techniques

Studies in Classification, Data Analysis, and Knowledge Organization - Data Analysis, Machine Learning and Knowledge Discovery ◽

10.1007/978-3-319-01595-8_15 ◽

2013 ◽

pp. 135-143 ◽

Cited By ~ 2

Author(s):

Matthew Bolaños ◽

John Forrest ◽

Michael Hahsler

Keyword(s):

Data Stream ◽

Large Datasets ◽

Clustering Techniques ◽

Stream Clustering ◽

Data Stream Clustering ◽

Using Data

Download Full-text

EvolveCluster: an evolutionary clustering algorithm for streaming data

Evolving Systems ◽

10.1007/s12530-021-09408-y ◽

2021 ◽

Author(s):

Christian Nordahl ◽

Veselka Boeva ◽

Håkan Grahn ◽

Marie Persson Netz

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Streaming Data ◽

Evolutionary Clustering ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Evolving Data

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

Uncertain Big Data Stream Clustering

Cyber-Physical Systems - Studies in Systems, Decision and Control ◽

10.1007/978-3-030-67892-0_29 ◽

2021 ◽

pp. 361-372

Author(s):

Alisa Makhmutova ◽

Igor Anikin

Keyword(s):

Big Data ◽

Data Stream ◽

Stream Clustering ◽

Data Stream Clustering

Download Full-text

Efficient Data Stream Clustering With Sliding Windows Based on Locality-Sensitive Hashing

IEEE Access ◽

10.1109/access.2018.2877138 ◽

2018 ◽

Vol 6 ◽

pp. 63757-63776 ◽

Cited By ~ 6

Author(s):

Jonghem Youn ◽

Junho Shim ◽

Sang-Goo Lee

Keyword(s):

Data Stream ◽

Locality Sensitive Hashing ◽

Sliding Windows ◽

Stream Clustering ◽

Efficient Data ◽

Data Stream Clustering

Download Full-text

MCDAStream: a real-time data stream clustering based on micro-cluster density and attraction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9051 ◽

2018 ◽

Vol 7 (2) ◽

pp. 270 ◽

Cited By ~ 1

Author(s):

Shyam Sunder Reddy K ◽

Shoba Bindu C

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Positional Information ◽

Time Data ◽

Stream Clustering ◽

Real Time Data ◽

Cluster Density ◽

Data Stream Clustering

Real-time data stream clustering has been widely used in many fields, and it can extract useful information from massive sets of data. Most of the existing density-based algorithms cluster the data streams based on the density within the micro-clusters. These algorithms completely omit the data density in the area between the micro-clusters and recluster the micro-clusters based on erroneous assumptions about the distribution of the data within and between the micro-clusters that lead to poor clustering results. This paper describes a novel density-based clustering algorithm for evolving data streams called MCDAStream, which clusters the data stream based on micro-cluster density and attraction between the micro-clusters. The attraction of micro-clusters characterizes the positional information of the data points in each micro-cluster. We generate better clustering results by considering both micro-cluster density and attraction of micro-clusters. The quality of the proposed algorithm is evaluated on various synthetic and real-time datasets with distinct characteristics and quality metrics.

Download Full-text

Concept Drift Detection in Data Stream Clustering and its Application on Weather Data

International Journal of Agricultural and Environmental Information Systems ◽

10.4018/ijaeis.2020010104 ◽

2020 ◽

Vol 11 (1) ◽

pp. 67-85 ◽

Cited By ~ 1

Author(s):

Namitha K. ◽

Santhosh Kumar G.

Keyword(s):

Data Streams ◽

Data Stream ◽

Weather Forecasting ◽

Concept Drift ◽

Clustering Algorithms ◽

Weather Data ◽

Stream Clustering ◽

Cluster Evolution ◽

Data Stream Clustering ◽

Concept Drift Detection

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

Download Full-text

Learning in the presence of concept recurrence in data stream clustering

Journal Of Big Data ◽

10.1186/s40537-020-00354-1 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

K. Namitha ◽

G. Santhosh Kumar

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Synthetic Data ◽

Real World Data ◽

Stream Classification ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Learning Scenarios

Abstract In the case of real-world data streams, the underlying data distribution will not be static; it is subject to variation over time, which is known as the primary reason for concept drift. Concept drift poses severe problems to the accuracy of a model in online learning scenarios. The recurring concept is a particular case of concept drift where the concepts already seen in the past reappear as the stream evolves. This problem is not yet studied in the context of stream clustering. This paper proposes a novel algorithm for identifying the recurring concepts in data stream clustering. During concept recurrence, the most matching model is retrieved from the repository and reused. The algorithm has minimum memory requirements and works online with the stream. Some of the concepts and definitions, already familiar in concept recurrence studies of stream classification have been redefined for clustering. The experiments conducted on real and synthetic data streams reveal that the proposed algorithm has the potential to identify recurring concepts.

Download Full-text