Empowering Density-based Micro-clusters In Dynamic Data Stream Clustering

2019 ◽

pp. 19-39

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Real Time ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Data Stream Mining ◽

Time Data ◽

Stream Mining ◽

New Challenges ◽

Mining Data Streams ◽

Different Sources

Data coming from different sources is referred to as data streams. Data stream mining is an online learning technique where each data point must be processed as the data arrives and discarded as the processing is completed. Progress of technologies has resulted in the monitoring these data streams in real time. Data streams has created many new challenges to the researchers in real time. The main features of this type of data are they are fast flowing, large amounts of data which are continuous and growing in nature, and characteristics of data might change in course of time which is termed as concept drift. This chapter addresses the problems in mining data streams with concept drift. Due to which, isolating the correct literature would be a grueling task for researchers and practitioners. This chapter tries to provide a solution as it would be an amalgamation of all techniques used for data stream mining with concept drift.

Download Full-text

Mining Data Streams

Advances in Business Information Systems and Analytics - Sentiment Analysis and Knowledge Discovery in Contemporary Business ◽

10.4018/978-1-5225-4999-4.ch014 ◽

2019 ◽

pp. 251-278

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Data Streams ◽

Data Stream ◽

Relevant Information ◽

Research Community ◽

Data Stream Mining ◽

Data Sets ◽

Stream Mining ◽

Real World Problem ◽

Mining Data Streams ◽

Over Time

In recent years, advancement in technologies has made it possible for most of the present-day organizations to store and record large streams of data. Such data sets which continuously and rapidly grow over time are referred to as data streams. Mining of such data streams is a unique opportunity and also a challenging task. Data stream mining is a process of gaining knowledge from continuous and rapid records of data. Due to increased streaming information, data stream mining has attracted the research community in the recent past. There is voluminous of literature which has been published in this domain over the past few years. Due to this, isolating the correct literature would be a grueling task for researchers and practitioners. While addressing a real-world problem, it would be more difficult to find relevant information as it would be hidden in data streams. This chapter tries to provide solution as it would be an amalgamation of all techniques used for data stream mining.

Download Full-text

EvolveCluster: an evolutionary clustering algorithm for streaming data

Evolving Systems ◽

10.1007/s12530-021-09408-y ◽

2021 ◽

Author(s):

Christian Nordahl ◽

Veselka Boeva ◽

Håkan Grahn ◽

Marie Persson Netz

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Streaming Data ◽

Evolutionary Clustering ◽

Stream Clustering ◽

The Past ◽

Data Stream Clustering ◽

Evolving Data

AbstractData has become an integral part of our society in the past years, arriving faster and in larger quantities than before. Traditional clustering algorithms rely on the availability of entire datasets to model them correctly and efficiently. Such requirements are not possible in the data stream clustering scenario, where data arrives and needs to be analyzed continuously. This paper proposes a novel evolutionary clustering algorithm, entitled EvolveCluster, capable of modeling evolving data streams. We compare EvolveCluster against two other evolutionary clustering algorithms, PivotBiCluster and Split-Merge Evolutionary Clustering, by conducting experiments on three different datasets. Furthermore, we perform additional experiments on EvolveCluster to further evaluate its capabilities on clustering evolving data streams. Our results show that EvolveCluster manages to capture evolving data stream behaviors and adapts accordingly.

Download Full-text

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011100101 ◽

2011 ◽

Vol 7 (4) ◽

pp. 1-20 ◽

Cited By ~ 2

Author(s):

Reem Al-Mulla ◽

Zaher Al Aghbari

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithms ◽

Arrival Rate ◽

Time Algorithm ◽

Incremental Algorithm ◽

Multiple Data ◽

Multiple Data Streams ◽

Mining Data Streams ◽

New Applications

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.

Download Full-text

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch012 ◽

2013 ◽

pp. 259-279

Author(s):

Reem Al-Mulla ◽

Zaher Al Aghbari

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithms ◽

Arrival Rate ◽

Time Algorithm ◽

Incremental Algorithm ◽

Multiple Data ◽

Multiple Data Streams ◽

Mining Data Streams ◽

New Applications

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.

Download Full-text

Dealing with Data Streams: Complex Event Processing vs. Data Stream Mining

Computational Science and Its Applications – ICCSA 2020 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-58811-3_1 ◽

2020 ◽

pp. 3-14

Author(s):

Moritz Lange ◽

Arne Koschel ◽

Irina Astrova

Keyword(s):

Data Streams ◽

Data Stream ◽

Complex Event Processing ◽

Data Stream Mining ◽

Event Processing ◽

Stream Mining

Download Full-text

An adaptive prediction method based on data stream mining for future driving cycle of vehicle

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020973152 ◽

2020 ◽

pp. 095440702097315

Author(s):

Cong Liu ◽

Yong Chen ◽

Li Zhao

Keyword(s):

Data Stream ◽

Prediction Accuracy ◽

Transition Probability ◽

Prediction Method ◽

Current Control ◽

Driving Cycle ◽

Streaming Data ◽

Data Stream Mining ◽

Stream Mining ◽

Driving Cycles

Due to complex and changeable driving cycles in urban roads, it is a challenging task for most of the current control strategies utilized in vehicles to adapt to the driving environment. At the same time, hardware requirements for storing and processing a massive amount of streaming data are increasing, which lead to excessive accumulated errors and high computational cost. To deal with this problem, an innovative prediction method, which is based on Markov chain and data stream mining, is proposed to predict the future driving cycle of vehicles. State transition probability matrix is updated in real time with data stream mining technology, and every time a new record arrives, the expired record is replaced by the new arrived one in the memory, and both state division and the sizes of the sliding window can be adjusted adaptively based on prediction accuracy for the changing driving cycles. The results show that the proposed method is more suitable for predicting changing driving cycles, which is able to maintain better prediction accuracy than the traditional method. In addition, based on the proposed method, the memory space utilized for storing temporary records were saved largely, and the calculation resource required was reduce.

Download Full-text

Knowledge Discovery Using Data Stream Mining

Advances in Business Information Systems and Analytics - Social Network Analytics for Contemporary Business Organizations ◽

10.4018/978-1-5225-5097-6.ch012 ◽

2018 ◽

pp. 231-258

Author(s):

Prasanna Lakshmi Kompalli

Keyword(s):

Data Streams ◽

Data Stream ◽

Relevant Information ◽

Research Community ◽

Data Stream Mining ◽

Data Sets ◽

Stream Mining ◽

Real World Problem ◽

Using Data ◽

Over Time

In recent years, advancement in technologies has made it possible for most of the present-day organizations to store and record large streams of data. Such data sets, which continuously and rapidly grow over time, are referred to as data streams. Mining of such data streams is a unique opportunity and also a challenging task. Data stream mining is a process of gaining knowledge from continuous and rapid records of data. Due to increased streaming information, data stream mining has attracted the research community in the recent past. There is voluminous literature that has been published in this domain over the past few years. Due to this, isolating the correct study would be grueling task for researchers and practitioners. While addressing a real-world problem, it would be difficult to find relevant information as it would be hidden in data streams. This chapter tries to provide solution as it is an amalgamation of all techniques used for data stream mining.

Download Full-text

Research on Distributed Data Stream Mining of Financial Risk Based on Double Privacy Protection

10.21203/rs.3.rs-38957/v1 ◽

2020 ◽

Author(s):

Yuhao Zhao

Keyword(s):

Data Mining ◽

Data Streams ◽

Privacy Protection ◽

Data Stream ◽

Financial Risk ◽

Data Stream Mining ◽

Distributed Data ◽

Stream Mining ◽

Mining Technology ◽

Distributed Data Streams

Abstract With the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively Reduced communication load.

Download Full-text

Research on wireless distributed financial risk data stream mining based on dual privacy protection

10.21203/rs.3.rs-38957/v2 ◽

2020 ◽

Author(s):

Yuhao Zhao

Keyword(s):

Data Mining ◽

Data Streams ◽

Privacy Protection ◽

Data Stream ◽

Financial Risk ◽

Data Stream Mining ◽

Distributed Data ◽

Stream Mining ◽

Mining Technology ◽

Distributed Data Streams

Abstract With the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively Reduced communication load.

Download Full-text

Empowering Density-based Micro-clusters In Dynamic Data Stream Clustering

Knowledge Discovery From Evolving Data Streams

Mining Data Streams

EvolveCluster: an evolutionary clustering algorithm for streaming data

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Dealing with Data Streams: Complex Event Processing vs. Data Stream Mining

An adaptive prediction method based on data stream mining for future driving cycle of vehicle

Knowledge Discovery Using Data Stream Mining

Research on Distributed Data Stream Mining of Financial Risk Based on Double Privacy Protection

Research on wireless distributed financial risk data stream mining based on dual privacy protection

Export Citation Format