stream data mining Latest Research Papers

There are various applications of clustering in the fields of machine learning, data mining, data compression along with pattern recognition. The existent techniques like the Llyods algorithm (sometimes called k-means) were affected by the issue of the algorithm which converges to a local optimum along with no approximation guarantee. For overcoming these shortcomings, an efficient k-means clustering approach is offered by this paper for stream data mining. Coreset is a popular and fundamental concept for k-means clustering in stream data. In each step, reduction determines a coreset of inputs, and represents the error, where P represents number of input points according to nested property of coreset. Hence, a bit reduction in error of final coreset gets n times more accurate. Therefore, this motivated the author to propose a new coreset-reduction algorithm. The proposed algorithm executed on the Covertype dataset, Spambase dataset, Census 1990 dataset, Bigcross dataset, and Tower dataset. Our algorithm outperforms with competitive algorithms like Streamkm[Formula: see text], BICO (BIRCH meets Coresets for k-means clustering), and BIRCH (Balance Iterative Reducing and Clustering using Hierarchies.

Comparative Analysis of Drift Detection Based Adaptive Ensemble Model with Different Drift Detection Techniques

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/06492 ◽

2021 ◽

Vol 23 (06) ◽

pp. 49-55

Author(s):

Sanjeev Kumar ◽

◽

Ravendra Singh ◽

Keyword(s):

Data Mining ◽

Comparative Analysis ◽

False Positive ◽

Opinion Mining ◽

Concept Drift ◽

Ensemble Classifier ◽

Stream Data ◽

Detection Techniques ◽

Stream Data Mining ◽

Detection Algorithms

Stream data mining is a popular research area these days. The concept drift detection and drift handling are the biggest challenges of stream data mining. Several drift detection algorithms have been developed which can accurately detect various drifts but have the problem of false-positive drift detection. The false-positive drift detection leads to the performance degradation of the classifier because of unnecessary training in between analyses. Classifier ensemble has shown its efficiency for drift detection, drift handling, and classification. But the ensemble classifiers could not detect the exact position of drift occurrence, so it has to update itself at some fixed interval, which leads to an unnecessary computational burden on the system. Combining the drift detection algorithm with an ensemble classifier can improve the performance and also solve the problems of false-positive drift detection and unnecessary updating of the ensemble classifier. In this paper, a model is proposed that creates a weighted adaptive ensemble classifier by updating it only when a drift detection signal is given by the used drift detection method. The proposed model is evaluated on text-based stream data for sentiment analysis and opinion mining with multiple drift detection algorithms and with multiple classification algorithms as base classifiers for the ensemble. A comparative analysis has been done, and the results have shown the efficiency of the proposed models.

Classification of Imbalanced Data Stream: Techniques and Challenges

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.92.9964 ◽

2021 ◽

Vol 9 (2) ◽

pp. 36-52

Author(s):

Mashaal A. Alfhaid ◽

Manal Abdullah

Keyword(s):

Data Mining ◽

Data Stream ◽

Concept Drift ◽

Class Imbalance ◽

Imbalanced Data ◽

Predictive Performance ◽

Knowledge Extraction ◽

Streaming Data ◽

Stream Data ◽

Stream Data Mining

As the number of generated data increases every day, this has brought the importance of data mining and knowledge extraction. In traditional data mining, offline status can be used for knowledge extraction. Nevertheless, dealing with stream data mining is different due to continuously arriving data that can be processed at a single scan besides the appearance of concept drift. As the pre-processing stage is critical in knowledge extraction, imbalanced stream data gain significant popularity in the last few years among researchers. Many real-world applications suffer from class imbalance including medical, business, fraud detection and etc. Learning from the supervised model includes classes whether it is binary- or multi-classes. These classes are often imbalance where it is divided into the majority (negative) class and minority (positive) class, which can cause a bias toward the majority class that leads to skew in predictive performance models. Handles imbalance streaming data is mandatory for more accurate and reliable learning models. In this paper, we will present an overview of data stream mining and its tools. Besides, summarize the problem of class imbalance and its different approaches. In addition, researchers will present the popular evaluation metrics and challenges prone from imbalanced streaming data.

Artificial intelligence driven framework for the structurization of free-text diagnostic reports

10.32469/10355/86487 ◽

2020 ◽

Author(s):

◽

Pericles Stavros Giannaris

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Natural Language Processing ◽

Knowledge Representation ◽

Natural Language ◽

Language Processing ◽

Free Text ◽

Stream Data ◽

Stream Data Mining ◽

Starting Point

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] Free-text sections of diagnostic reports contain a wealth of data on patients, diseases, and complex diagnostic processes. However, free-text data are a poor starting point for computer-based analytics. The majority of natural language processing (NLP) based approaches lack a capacity to accurately extract complex diagnostic entities and their relationships as well as to provide adequate knowledge representation (KR) for down-stream data mining applications. In order to overcome these limitations, a novel informatics framework is introduced for the analysis of free-text diagnostic reports. The framework is based on artificial intelligence (AI) modeling. Here, AI-based modeling integrates natural language processing information extraction techniques (NLP-IE), ontology-based knowledge representation, n-ary relations according to ontological patterns, and information entropy-based data mining approaches. Diagnostic reports are transformed to knowledge graphs (KGs) of relational triples for further analysis using computers. The goal is to facilitate analysis of diagnostic reports using computers. This informatics framework has potential to broadly impact diagnostic medicine and to be extended to other biomedical domains as well.

Evolving clustering algorithm based on mixture of typicalities for stream data mining

Future Generation Computer Systems ◽

10.1016/j.future.2020.01.017 ◽

2020 ◽

Vol 106 ◽

pp. 672-684 ◽

Cited By ~ 3

Author(s):

José Maia ◽

Carlos Alberto Severiano ◽

Frederico Gadelha Guimarães ◽

Cristiano Leite de Castro ◽

André Paim Lemos ◽

...

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Stream Data ◽

Stream Data Mining

Stream Data Mining: Algorithms and Their Probabilistic Properties

10.1007/978-3-030-13962-9 ◽

2020 ◽

Cited By ~ 5

Author(s):

Leszek Rutkowski ◽

Maciej Jaworski ◽

Piotr Duda

Keyword(s):

Data Mining ◽

Stream Data ◽

Stream Data Mining ◽

Data Mining Algorithms ◽

Mining Algorithms

Corrigendum to ‘How to adjust an ensemble size in stream data mining?’ Information Sciences, vol. 381 (2017), pp. 46-54

Information Sciences ◽

10.1016/j.ins.2018.11.012 ◽

2019 ◽

Vol 477 ◽

pp. 545

Author(s):

Lena Pietruczuk ◽

Leszek Rutkowski ◽

Maciej Jaworski ◽

Piotr Duda

Keyword(s):

Data Mining ◽

Ensemble Size ◽

Stream Data ◽

Stream Data Mining ◽

Information Sciences

On the Hermite Series-Based Generalized Regression Neural Networks for Stream Data Mining

Neural Information Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-030-36718-3_37 ◽

2019 ◽

pp. 437-448

Author(s):

Danuta Rutkowska ◽

Leszek Rutkowski

Keyword(s):

Data Mining ◽

Neural Networks ◽

Stream Data ◽

Stream Data Mining ◽

Generalized Regression Neural Networks ◽

Generalized Regression ◽

Hermite Series

Applications of Stream Data Mining on the Internet of Things: A Survey

2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT) ◽

10.1109/ibigdelft.2018.8625289 ◽

2018 ◽

Author(s):

Emine Rumeysa Guler ◽

Suat Ozdemir

Keyword(s):

Data Mining ◽

Internet Of Things ◽

The Internet ◽

Stream Data ◽

Stream Data Mining ◽

The Internet Of Things

Comparative Study of Different Classification Algorithms for Stream Data Mining Using MOA

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i11.614616 ◽

2018 ◽

Vol 6 (11) ◽

pp. 614-616

Author(s):

Ashish P. Joshi ◽

Biraj V. Patel

Keyword(s):

Data Mining ◽

Comparative Study ◽

Classification Algorithms ◽

Stream Data ◽

Stream Data Mining

stream data mining
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Clustering Algorithm in Stream Data Using Strong Coreset

Comparative Analysis of Drift Detection Based Adaptive Ensemble Model with Different Drift Detection Techniques

Classification of Imbalanced Data Stream: Techniques and Challenges

Artificial intelligence driven framework for the structurization of free-text diagnostic reports

Evolving clustering algorithm based on mixture of typicalities for stream data mining

Stream Data Mining: Algorithms and Their Probabilistic Properties

Corrigendum to ‘How to adjust an ensemble size in stream data mining?’ Information Sciences, vol. 381 (2017), pp. 46-54

On the Hermite Series-Based Generalized Regression Neural Networks for Stream Data Mining

Applications of Stream Data Mining on the Internet of Things: A Survey

Comparative Study of Different Classification Algorithms for Stream Data Mining Using MOA

Export Citation Format

stream data miningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Clustering Algorithm in Stream Data Using Strong Coreset

Comparative Analysis of Drift Detection Based Adaptive Ensemble Model with Different Drift Detection Techniques

Classification of Imbalanced Data Stream: Techniques and Challenges

Artificial intelligence driven framework for the structurization of free-text diagnostic reports

Evolving clustering algorithm based on mixture of typicalities for stream data mining

Stream Data Mining: Algorithms and Their Probabilistic Properties

Corrigendum to ‘How to adjust an ensemble size in stream data mining?’ Information Sciences, vol. 381 (2017), pp. 46-54

On the Hermite Series-Based Generalized Regression Neural Networks for Stream Data Mining

Applications of Stream Data Mining on the Internet of Things: A Survey

Comparative Study of Different Classification Algorithms for Stream Data Mining Using MOA

stream data mining
Recently Published Documents