scholarly journals Concept Drift Detection in Data Stream Clustering and its Application on Weather Data

Author(s):  
Namitha K. ◽  
Santhosh Kumar G.

This article presents a stream mining framework to cluster the data stream and monitor its evolution. Even though concept drift is expected to be present in data streams, explicit drift detection is rarely done in stream clustering algorithms. The proposed framework is capable of explicit concept drift detection and cluster evolution analysis. Concept drift is caused by the changes in data distribution over time. Relationship between concept drift and the occurrence of physical events has been studied by applying the framework on the weather data stream. Experiments led to the conclusion that the concept drift accompanied by a change in the number of clusters indicates a significant weather event. This kind of online monitoring and its results can be utilized in weather forecasting systems in various ways. Weather data streams produced by automatic weather stations (AWS) are used to conduct this study.

2018 ◽  
Vol 2 (4) ◽  
pp. 32 ◽  
Author(s):  
Umesh Kokate ◽  
Arvind Deshpande ◽  
Parikshit Mahalle ◽  
Pramod Patil

Data growth in today’s world is exponential, many applications generate huge amount of data streams at very high speed such as smart grids, sensor networks, video surveillance, financial systems, medical science data, web click streams, network data, etc. In the case of traditional data mining, the data set is generally static in nature and available many times for processing and analysis. However, data stream mining has to satisfy constraints related to real-time response, bounded and limited memory, single-pass, and concept-drift detection. The main problem is identifying the hidden pattern and knowledge for understanding the context for identifying trends from continuous data streams. In this paper, various data stream methods and algorithms are reviewed and evaluated on standard synthetic data streams and real-life data streams. Density-micro clustering and density-grid-based clustering algorithms are discussed and comparative analysis in terms of various internal and external clustering evaluation methods is performed. It was observed that a single algorithm cannot satisfy all the performance measures. The performance of these data stream clustering algorithms is domain-specific and requires many parameters for density and noise thresholds.


2021 ◽  
Author(s):  
Christian Nordahl ◽  
Veselka Boeva ◽  
Håkan Grahn ◽  
Marie Persson Netz

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.


2020 ◽  
Vol 7 (1) ◽  
Author(s):  
K. Namitha ◽  
G. Santhosh Kumar

Abstract In the case of real-world data streams, the underlying data distribution will not be static; it is subject to variation over time, which is known as the primary reason for concept drift. Concept drift poses severe problems to the accuracy of a model in online learning scenarios. The recurring concept is a particular case of concept drift where the concepts already seen in the past reappear as the stream evolves. This problem is not yet studied in the context of stream clustering. This paper proposes a novel algorithm for identifying the recurring concepts in data stream clustering. During concept recurrence, the most matching model is retrieved from the repository and reused. The algorithm has minimum memory requirements and works online with the stream. Some of the concepts and definitions, already familiar in concept recurrence studies of stream classification have been redefined for clustering. The experiments conducted on real and synthetic data streams reveal that the proposed algorithm has the potential to identify recurring concepts.


In the fast growing world applications are generating data in enormous volumes called data streams. Data stream is imaginably large, continual, rapid flow of information and in data mining the important tool is called clustering, hence data stream clustering (DSC) can be said as active research area. Recent attention of data stream clustering is through the applications that contain large amounts of streaming data. Data stream clustering is used in many areas such as weather forecasting, financial transactions, website analysis, sensor network monitoring, e-business, telephone records and telecommunications. In case of data stream clustering most popularly used heuristic is K-means and other algorithms like K-medoids and the popular BIRCH are developed. The aim of the abstract is to review the developments and trends of data stream clustering methods and analyze typical DSC algorithms proposed in recent years, such as BIRCH, STREAM, DSTREAM and some more algorithms.


Author(s):  
Bhaskar Adepu ◽  
Jayadev Gyani ◽  
G. Narsimha

A Few algorithms were actualized by the analysts for performing clustering of data streams. Most of these algorithms require that the number of clusters (K) has to be fixed by the customer based on input data and it can be kept settled all through the clustering process. Stream clustering has faced few difficulties in picking up K. In this paper, we propose an efficient approach for data stream clustering by embracing an Improved Differential Evolution (IDE) algorithm. The IDE algorithm is one of the quick, powerful and productive global optimization approach for programmed clustering. In our proposed approach, we additionally apply an entropy based method for distinguishing the concept drift in the data stream and in this way updating the clustering procedure online. We demonstrated that our proposed method is contrasted with Genetic Algorithm and identified as proficient optimization algorithm. The performance of our proposed technique is assessed and cr eates the accuracy of 92.29%, the precision is 86.96%, recall is 90.30% and F-measure estimate is 88.60%.


2019 ◽  
Vol 06 (02) ◽  
pp. 223-256
Author(s):  
Amal Abid ◽  
Salma Jamoussi ◽  
Abdelmajid Ben Hamadou

The spread of real-time applications has led to a huge amount of data shared between users. This vast volume of data rapidly evolving over time is referred to as data stream. Clustering and processing such data poses many challenges to the data mining community. Indeed, traditional data mining techniques become unfeasible to mine such a continuous flow of data where characteristics, features, and concepts are rapidly changing over time. This paper presents a novel method for data stream clustering. In this context, major challenges of data stream processing are addressed, namely, infinite length, concept drift, novelty detection, and feature evolution. To handle these issues, the proposed method uses the Artificial Immune System (AIS) meta-heuristic. The latter has been widely used for data mining tasks and it owns the property of adaptability required by data stream clustering algorithms. Our method, called AIS-Clus, is able to detect novel concepts using the performance of the learning process of the AIS meta-heuristic. Furthermore, AIS-Clus has the ability to adapt its model to handle concept drift and feature evolution for textual data streams. Experimental results have been performed on textual datasets where efficient and promising results are obtained.


2015 ◽  
Vol 77 (18) ◽  
Author(s):  
Maryam Mousavi ◽  
Azuraliza Abu Bakar

In recent years, clustering methods have attracted more attention in analysing and monitoring data streams. Density-based techniques are the remarkable category of clustering techniques that are able to detect the clusters with arbitrary shapes and noises. However, finding the clusters with local density varieties is a difficult task. For handling this problem, in this paper, a new density-based clustering algorithm for data streams is proposed. This algorithm can improve the offline phase of density-based algorithm based on MinPts parameter. The experimental results show that the proposed technique can improve the clustering quality in data streams with different densities.


2014 ◽  
Vol 933 ◽  
pp. 768-773 ◽  
Author(s):  
Wei Hua Ma

Data stream in a popular research topic in big data era. There are many research results on data stream clustering domain. This paper firstly has a brief introduction to data stream methodologies, such as sampling, sliding windows, etc. Finally, it presents a survey on data streams clustering techniques.


2018 ◽  
Vol 7 (2) ◽  
pp. 270 ◽  
Author(s):  
Shyam Sunder Reddy K ◽  
Shoba Bindu C

Real-time data stream clustering has been widely used in many fields, and it can extract useful information from massive sets of data. Most of the existing density-based algorithms cluster the data streams based on the density within the micro-clusters. These algorithms completely omit the data density in the area between the micro-clusters and recluster the micro-clusters based on erroneous assumptions about the distribution of the data within and between the micro-clusters that lead to poor clustering results. This paper describes a novel density-based clustering algorithm for evolving data streams called MCDAStream, which clusters the data stream based on micro-cluster density and attraction between the micro-clusters. The attraction of micro-clusters characterizes the positional information of the data points in each micro-cluster. We generate better clustering results by considering both micro-cluster density and attraction of micro-clusters. The quality of the proposed algorithm is evaluated on various synthetic and real-time datasets with distinct characteristics and quality metrics.


Sign in / Sign up

Export Citation Format

Share Document