Stream Data Mining

2017 ◽  
pp. 2212-2212
Author(s):  
Manmohan Singh ◽  
Rajendra Pamula ◽  
Alok Kumar

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.


2020 ◽  
Vol 106 ◽  
pp. 672-684 ◽  
Author(s):  
José Maia ◽  
Carlos Alberto Severiano ◽  
Frederico Gadelha Guimarães ◽  
Cristiano Leite de Castro ◽  
André Paim Lemos ◽  
...  

2021 ◽  
Vol 23 (06) ◽  
pp. 49-55
Author(s):  
Sanjeev Kumar ◽  
◽  
Ravendra Singh ◽  

Stream data mining is a popular research area these days. The concept drift detection and drift handling are the biggest challenges of stream data mining. Several drift detection algorithms have been developed which can accurately detect various drifts but have the problem of false-positive drift detection. The false-positive drift detection leads to the performance degradation of the classifier because of unnecessary training in between analyses. Classifier ensemble has shown its efficiency for drift detection, drift handling, and classification. But the ensemble classifiers could not detect the exact position of drift occurrence, so it has to update itself at some fixed interval, which leads to an unnecessary computational burden on the system. Combining the drift detection algorithm with an ensemble classifier can improve the performance and also solve the problems of false-positive drift detection and unnecessary updating of the ensemble classifier. In this paper, a model is proposed that creates a weighted adaptive ensemble classifier by updating it only when a drift detection signal is given by the used drift detection method. The proposed model is evaluated on text-based stream data for sentiment analysis and opinion mining with multiple drift detection algorithms and with multiple classification algorithms as base classifiers for the ensemble. A comparative analysis has been done, and the results have shown the efficiency of the proposed models.


Author(s):  
HUI CHEN

Recent emerging applications, such as network traffic analysis, web click stream mining, power consumption measurement, sensor network data analysis, and dynamic tracing of stock fluctuation, call for study of a new kind of data, stream data. Many data stream management systems, prototype systems and software components have been developed to manage the streams or extract knowledge from stream data. Mining frequent patterns is a foundational job for the methods of data mining and knowledge discovery. This paper proposes an algorithm for mining the recent frequent patterns over an online data stream. This method uses RFP-tree to store compactly the recent frequent patterns of a stream. The content of each transaction is incrementally updated into the pattern tree upon its arrival by scanning the stream only once. Moreover, the strategy of conservative computation and time decaying model are used to ensure the correctness of the mining results. Finally, the performance results of extensive simulation show that our work can reduce the average processing time of stream data element and it is superior to other analogous algorithms.


Sign in / Sign up

Export Citation Format

Share Document