A Clustering Algorithm in Stream Data Using Strong Coreset

Journal of Interconnection Networks ◽

10.1142/s0219265921430118 ◽

2021 ◽

Author(s):

Manmohan Singh ◽

Rajendra Pamula ◽

Alok Kumar

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Local Optimum ◽

Reduction Algorithm ◽

Stream Data ◽

Stream Data Mining ◽

Clustering Approach ◽

Approximation Guarantee ◽

Competitive Algorithms ◽

Learning Data

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Download Full-text

Evolving clustering algorithm based on mixture of typicalities for stream data mining

Future Generation Computer Systems ◽

10.1016/j.future.2020.01.017 ◽

2020 ◽

Vol 106 ◽

pp. 672-684 ◽

Cited By ~ 3

Author(s):

José Maia ◽

Carlos Alberto Severiano ◽

Frederico Gadelha Guimarães ◽

Cristiano Leite de Castro ◽

André Paim Lemos ◽

...

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Stream Data ◽

Stream Data Mining

Download Full-text

Comparative Study of Different Classification Algorithms for Stream Data Mining Using MOA

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.614616 ◽

2018 ◽

Vol 6 (11) ◽

pp. 614-616

Author(s):

Ashish P. Joshi ◽

Biraj V. Patel

Keyword(s):

Data Mining ◽

Comparative Study ◽

Classification Algorithms ◽

Stream Data ◽

Stream Data Mining

Download Full-text

Analysis of Classification and Clustering based Novel Class Detection Techniques for Stream Data Mining

International Journal of Engineering Research and ◽

10.17577/ijertv4is100160 ◽

2015 ◽

Vol V4 (10) ◽

Author(s):

Kamini Tandel ◽

Jignasa N. Patel ◽

Keyword(s):

Data Mining ◽

Stream Data ◽

Detection Techniques ◽

Stream Data Mining ◽

Classification And Clustering

Download Full-text

Canonical PSO Based K-Means Clustering Approach for Real Datasets

International Scholarly Research Notices ◽

10.1155/2014/414013 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 1

Author(s):

Lopamudra Dey ◽

Sanjay Chakraborty

Keyword(s):

Data Mining ◽

Air Pollution ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Cluster Validity ◽

Validity Assessment ◽

Different Types ◽

Clustering Approach ◽

Validity Measure

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.

Download Full-text

Stream Data Mining

Encyclopedia of GIS ◽

10.1007/978-3-319-17885-1_101358 ◽

2017 ◽

pp. 2212-2212

Keyword(s):

Data Mining ◽

Stream Data ◽

Stream Data Mining

Download Full-text

SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining

2009 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2009.5178874 ◽

2009 ◽

Cited By ~ 44

Author(s):

Sheng Chen ◽

Haibo He

Keyword(s):

Data Mining ◽

Stream Data ◽

Stream Data Mining

Download Full-text

An Evolutionary Clustering Approach to Pareto Solutions in Multiobjective Optimization

Volume 2: 28th Design Automation Conference ◽

10.1115/detc2002/dac-34048 ◽

2002 ◽

Cited By ~ 4

Author(s):

Min Joong Jeong ◽

Sinobu Yoshimura

Keyword(s):

Multiobjective Optimization ◽

Engineering Design ◽

Function Space ◽

Clustering Algorithm ◽

Computational Effort ◽

Local Optimum ◽

Pareto Solutions ◽

Evolutionary Clustering ◽

Interpretation Process ◽

Clustering Approach

Pareto solutions in multiobjective optimization are very problematic to measuring the characteristics of solutions for engineering design because of their considerable variety in function space and parameter space. To overcome these situations, a clustering-based interpretation process for Pareto solutions is considered. For better competitive clustering algorithm, we propose an evolutionary clustering algorithm — ECA. The ECA requires less computational effort, and overcomes local optimum of the K-means clustering algorithm and its related algorithms. Effectiveness of the method is examined in detail through the comparison with other algorithms.

Download Full-text

Comparative Analysis of Drift Detection Based Adaptive Ensemble Model with Different Drift Detection Techniques

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/06492 ◽

2021 ◽

Vol 23 (06) ◽

pp. 49-55

Author(s):

Sanjeev Kumar ◽

◽

Ravendra Singh ◽

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

False Positive ◽

Opinion Mining ◽

Concept Drift ◽

Ensemble Classifier ◽

Stream Data ◽

Detection Techniques ◽

Stream Data Mining ◽

Detection Algorithms

Stream data mining is a popular research area these days. The concept drift detection and drift handling are the biggest challenges of stream data mining. Several drift detection algorithms have been developed which can accurately detect various drifts but have the problem of false-positive drift detection. The false-positive drift detection leads to the performance degradation of the classifier because of unnecessary training in between analyses. Classifier ensemble has shown its efficiency for drift detection, drift handling, and classification. But the ensemble classifiers could not detect the exact position of drift occurrence, so it has to update itself at some fixed interval, which leads to an unnecessary computational burden on the system. Combining the drift detection algorithm with an ensemble classifier can improve the performance and also solve the problems of false-positive drift detection and unnecessary updating of the ensemble classifier. In this paper, a model is proposed that creates a weighted adaptive ensemble classifier by updating it only when a drift detection signal is given by the used drift detection method. The proposed model is evaluated on text-based stream data for sentiment analysis and opinion mining with multiple drift detection algorithms and with multiple classification algorithms as base classifiers for the ensemble. A comparative analysis has been done, and the results have shown the efficiency of the proposed models.

Download Full-text

On the Hermite Series-Based Generalized Regression Neural Networks for Stream Data Mining

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-36718-3_37 ◽

2019 ◽

pp. 437-448

Author(s):

Danuta Rutkowska ◽

Leszek Rutkowski

Keyword(s):

Data Mining ◽

Neural Networks ◽

Stream Data ◽

Stream Data Mining ◽

Generalized Regression Neural Networks ◽

Generalized Regression ◽

Hermite Series

Download Full-text

EFFICIENTLY MINING RECENT FREQUENT PATTERNS OVER ONLINE TRANSACTIONAL DATA STREAMS

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194009004325 ◽

2009 ◽

Vol 19 (05) ◽

pp. 707-725 ◽

Cited By ~ 1

Author(s):

HUI CHEN

Keyword(s):

Data Mining ◽

Data Stream ◽

Frequent Patterns ◽

Stream Data ◽

Data Stream Management ◽

Online Data ◽

Stream Data Mining ◽

Network Traffic Analysis ◽

Stream Management ◽

Performance Results

Recent emerging applications, such as network traffic analysis, web click stream mining, power consumption measurement, sensor network data analysis, and dynamic tracing of stock fluctuation, call for study of a new kind of data, stream data. Many data stream management systems, prototype systems and software components have been developed to manage the streams or extract knowledge from stream data. Mining frequent patterns is a foundational job for the methods of data mining and knowledge discovery. This paper proposes an algorithm for mining the recent frequent patterns over an online data stream. This method uses RFP-tree to store compactly the recent frequent patterns of a stream. The content of each transaction is incrementally updated into the pattern tree upon its arrival by scanning the stream only once. Moreover, the strategy of conservative computation and time decaying model are used to ensure the correctness of the mining results. Finally, the performance results of extensive simulation show that our work can reduce the average processing time of stream data element and it is superior to other analogous algorithms.

Download Full-text