stream data mining
Recently Published Documents


TOTAL DOCUMENTS

50
(FIVE YEARS 8)

H-INDEX

8
(FIVE YEARS 2)

Author(s):  
Manmohan Singh ◽  
Rajendra Pamula ◽  
Alok Kumar

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.


2021 ◽  
Vol 23 (06) ◽  
pp. 49-55
Author(s):  
Sanjeev Kumar ◽  
◽  
Ravendra Singh ◽  

Stream data mining is a popular research area these days. The concept drift detection and drift handling are the biggest challenges of stream data mining. Several drift detection algorithms have been developed which can accurately detect various drifts but have the problem of false-positive drift detection. The false-positive drift detection leads to the performance degradation of the classifier because of unnecessary training in between analyses. Classifier ensemble has shown its efficiency for drift detection, drift handling, and classification. But the ensemble classifiers could not detect the exact position of drift occurrence, so it has to update itself at some fixed interval, which leads to an unnecessary computational burden on the system. Combining the drift detection algorithm with an ensemble classifier can improve the performance and also solve the problems of false-positive drift detection and unnecessary updating of the ensemble classifier. In this paper, a model is proposed that creates a weighted adaptive ensemble classifier by updating it only when a drift detection signal is given by the used drift detection method. The proposed model is evaluated on text-based stream data for sentiment analysis and opinion mining with multiple drift detection algorithms and with multiple classification algorithms as base classifiers for the ensemble. A comparative analysis has been done, and the results have shown the efficiency of the proposed models.


2021 ◽  
Vol 9 (2) ◽  
pp. 36-52
Author(s):  
Mashaal A. Alfhaid ◽  
Manal Abdullah

As the number of generated data increases every day, this has brought the importance of data mining and knowledge extraction. In traditional data mining, offline status can be used for knowledge extraction. Nevertheless, dealing with stream data mining is different due to continuously arriving data that can be processed at a single scan besides the appearance of concept drift. As the pre-processing stage is critical in knowledge extraction, imbalanced stream data gain significant popularity in the last few years among researchers. Many real-world applications suffer from class imbalance including medical, business, fraud detection and etc. Learning from the supervised model includes classes whether it is binary- or multi-classes. These classes are often imbalance where it is divided into the majority (negative) class and minority (positive) class, which can cause a bias toward the majority class that leads to skew in predictive performance models. Handles imbalance streaming data is mandatory for more accurate and reliable learning models. In this paper, we will present an overview of data stream mining and its tools. Besides, summarize the problem of class imbalance and its different approaches. In addition, researchers will present the popular evaluation metrics and challenges prone from imbalanced streaming data.


2020 ◽  
Author(s):  
◽  
Pericles Stavros Giannaris

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Free-text sections of diagnostic reports contain a wealth of data on patients, diseases, and complex diagnostic processes. However, free-text data are a poor starting point for computer-based analytics. The majority of natural language processing (NLP) based approaches lack a capacity to accurately extract complex diagnostic entities and their relationships as well as to provide adequate knowledge representation (KR) for down-stream data mining applications. In order to overcome these limitations, a novel informatics framework is introduced for the analysis of free-text diagnostic reports. The framework is based on artificial intelligence (AI) modeling. Here, AI-based modeling integrates natural language processing information extraction techniques (NLP-IE), ontology-based knowledge representation, n-ary relations according to ontological patterns, and information entropy-based data mining approaches. Diagnostic reports are transformed to knowledge graphs (KGs) of relational triples for further analysis using computers. The goal is to facilitate analysis of diagnostic reports using computers. This informatics framework has potential to broadly impact diagnostic medicine and to be extended to other biomedical domains as well.


2020 ◽  
Vol 106 ◽  
pp. 672-684 ◽  
Author(s):  
José Maia ◽  
Carlos Alberto Severiano ◽  
Frederico Gadelha Guimarães ◽  
Cristiano Leite de Castro ◽  
André Paim Lemos ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document