An Efficient Algorithm for Mining Frequent Closed Itemsets over Data Stream

Historically, data mining research has been focused on discovering sets of attributes that discriminate data entities into classes or association rules between attributes. In contrast, we are working to develop data mining techniques to discover patterns consisting of complex relationships between entities. Our research is particularly applicable to domains in which the data is event driven, such as counter-terrorism intelligence analysis. In this paper we describe an algorithm designed to operate over relational data received from a continuous stream. Our approach includes a mechanism for summarizing discoveries from previous data increments so that the globally best patterns can be computed by examining only the new data increment. We then describe a method by which relational dependencies that span across temporal increment boundaries can be efficiently resolved so that additional pattern instances, which do not reside entirely in a single data increment, can be discovered. We also describe a method for change detection using a measure of central tendency designed for graph data. We contrast two formulations of the change detection process and demonstrate the ability to identify salient changes along meaningful dimensions and recognize trends in a relational data stream.

Download Full-text

A Frequent Pattern Conjunction Heuristic for Rule Generation in Data Streams

Information ◽

10.3390/info12010024 ◽

2021 ◽

Vol 12 (1) ◽

pp. 24

Author(s):

Frederic Stahl ◽

Thien Le ◽

Atta Badii ◽

Mohamed Medhat Gaber

Keyword(s):

Data Mining ◽

Real Time ◽

Data Streams ◽

Data Stream ◽

Practical Importance ◽

Rule Induction ◽

Streaming Data ◽

Frequent Pattern ◽

Unseen Data ◽

Rule Sets

This paper introduces a new and expressive algorithm for inducing descriptive rule-sets from streaming data in real-time in order to describe frequent patterns explicitly encoded in the stream. Data Stream Mining (DSM) is concerned with the automatic analysis of data streams in real-time. Rapid flows of data challenge the state-of-the art processing and communication infrastructure, hence the motivation for research and innovation into real-time algorithms that analyse data streams on-the-fly and can automatically adapt to concept drifts. To date, DSM techniques have largely focused on predictive data mining applications that aim to forecast the value of a particular target feature of unseen data instances, answering questions such as whether a credit card transaction is fraudulent or not. A real-time, expressive and descriptive Data Mining technique for streaming data has not been previously established as part of the DSM toolkit. This has motivated the work reported in this paper, which has resulted in developing and validating a Generalised Rule Induction (GRI) tool, thus producing expressive rules as explanations that can be easily understood by human analysts. The expressiveness of decision models in data streams serves the objectives of transparency, underpinning the vision of `explainable AI’ and yet is an area of research that has attracted less attention despite being of high practical importance. The algorithm introduced and described in this paper is termed Fast Generalised Rule Induction (FGRI). FGRI is able to induce descriptive rules incrementally for raw data from both categorical and numerical features. FGRI is able to adapt rule-sets to changes of the pattern encoded in the data stream (concept drift) on the fly as new data arrives and can thus be applied continuously in real-time. The paper also provides a theoretical, qualitative and empirical evaluation of FGRI.

Download Full-text

An Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream

Next-Generation Applied Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-642-02568-6_78 ◽

2009 ◽

pp. 767-776 ◽

Cited By ~ 6

Author(s):

Show-Jane Yen ◽

Yue-Shi Lee ◽

Cheng-Wei Wu ◽

Chin-Lin Lin

Keyword(s):

Efficient Algorithm ◽

Data Stream ◽

Closed Itemsets

Download Full-text

An Efficient Algorithm in Mining Frequent Itemsets with Weights over Data Stream Using Tree Data Structure

International Journal of Intelligent Systems and Applications ◽

10.5815/ijisa.2015.12.02 ◽

2015 ◽

Vol 7 (12) ◽

pp. 23-31

Author(s):

Long Nguyen Hung ◽

Thuy Nguyen Thi Thu ◽

Giap Cu Nguyen

Keyword(s):

Data Structure ◽

Efficient Algorithm ◽

Data Stream ◽

Frequent Itemsets ◽

Tree Data ◽

Tree Data Structure ◽

Mining Frequent Itemsets

Download Full-text

Research on Distributed Data Stream Mining of Financial Risk Based on Double Privacy Protection

10.21203/rs.3.rs-38957/v1 ◽

2020 ◽

Author(s):

Yuhao Zhao

Keyword(s):

Data Mining ◽

Data Streams ◽

Privacy Protection ◽

Data Stream ◽

Financial Risk ◽

Data Stream Mining ◽

Distributed Data ◽

Stream Mining ◽

Mining Technology ◽

Distributed Data Streams

Abstract With the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively Reduced communication load.

Download Full-text

Research on wireless distributed financial risk data stream mining based on dual privacy protection

10.21203/rs.3.rs-38957/v2 ◽

2020 ◽

Author(s):

Yuhao Zhao

Keyword(s):

Data Mining ◽

Data Streams ◽

Privacy Protection ◽

Data Stream ◽

Financial Risk ◽

Data Stream Mining ◽

Distributed Data ◽

Stream Mining ◽

Mining Technology ◽

Distributed Data Streams

Abstract With the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively Reduced communication load.

Download Full-text

A Novel Algorithm for Predicting Valuable Items in Data Streams

Journal of Applied Information Science ◽

10.21863/jais/2015.3.2.006 ◽

2015 ◽

Vol 3 (2) ◽

Author(s):

S. Vijayarani Mohan

Keyword(s):

Data Mining ◽

Data Streams ◽

Data Stream ◽

Pattern Mining ◽

Research Work ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Frequent Items ◽

And Performance ◽

Transactional Data

A data stream is a real time, continuous, structured sequence of data items. Mining data stream is the process of extracting knowledge from continuous arrival of rapid data records. Data can arrive fast and in continuous manner. It is very difficult to perform mining process. Normally, stream mining algorithms are designed to scan the database only once, and it is a complicated task to extract the knowledge from the database by a single scan. Data streams are a computational challenge to data mining problems because of the additional algorithmic constraints created by the large volume of data. Popular data mining techniques namely clustering, classification, and frequent pattern mining are applied to data streams for extracting the knowledge. This research work mainly concentrates on how to predict the valuable items which are found in a transactional data of a data stream. In the literature, most of the researchers have discussed about how the frequent items are mined from the data streams. This research work helps to predict the valuable items in a transactional data. Frequent item mining is defined as finding the items which occur frequently, i.e. the occurrence of items above the given threshold is considered as frequent items. Valuable item mining is nothing but finding the costliest or most valuable items of a database. Predicting this information helps businesses to know about the sales details about the valuable items which guide to make crucial decisions, such as catalogue drawing, cross promotion, end user shopping, and performance scrutiny. In this research work, a new algorithm namely VIM (Valuable Item Mining) is proposed for finding the valuable items in data streams. The performance of this algorithm is analysed by using the factors, number of valuable items discovered, and execution time.

Download Full-text

Research on wireless distributed financial risk data stream mining based on dual privacy protection

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-020-01842-x ◽

2020 ◽

Vol 2020 (1) ◽

Author(s):

Yuhao Zhao

Keyword(s):

Data Mining ◽

Data Streams ◽

Privacy Protection ◽

Data Stream ◽

Financial Risk ◽

Data Stream Mining ◽

Distributed Data ◽

Stream Mining ◽

Mining Technology ◽

Distributed Data Streams

AbstractWith the advancement of network technology and large-scale computing, distributed data streams have been widely used in the application of financial risk analysis. However, while data mining reveals financial models, it also increasingly poses a threat to privacy. Therefore, how to prevent privacy leakage during the efficient mining process poses new challenges to the data mining technology. This article is mainly aimed at the current privacy data leakage in financial data mining, combined with existing data mining technology to study data mining and privacy protection. First, a data mining model for dual privacy protection is defined, which can better meet the characteristics of distributed data streams while achieving privacy protection effects. Secondly, a privacy-oriented data stream mining algorithm is proposed, which uses random interference technology to effectively protect the original sensitive data. Finally, the analysis and discussion of the algorithm in this paper through simulation experiments show that the algorithm is feasible and effective, and can better adapt to the distributed data flow distribution and dynamic characteristics, while achieving better privacy protection effects, effectively reduced communication load.

Download Full-text