Research on mining global maximal frequent itemsets for health big data

Author(s):  
Bo He ◽  
Jianhui Pei
2021 ◽  
Vol 11 (21) ◽  
pp. 10399
Author(s):  
Yalong Zhang ◽  
Wei Yu ◽  
Qiuqin Zhu ◽  
Xuan Ma ◽  
Hisakazu Ogura

When it comes to association rule mining, all frequent itemsets are first found, and then the confidence level of association rules is calculated through the support degree of frequent itemsets. As all non-empty subsets in frequent itemsets are still frequent itemsets, all frequent itemsets can be acquired only by finding all maximal frequent itemsets (MFIs), whose supersets are not frequent itemsets. In this study, an algorithm, named right-hand side expanding (RHSE), which can accurately find all MFIs, was proposed. First, an Expanding Operation was designed, which, starting from any given frequent itemset, could add items using certain rules and form some supersets of given frequent itemsets. In addition, these supersets were all MFIs. Next, this operator was used to add items by taking all frequent 1-itemsets as the starting point alternately, and all MFIs were found in the end. Due to the special design of the Expanding Operation, each MFI could be found. Moreover, the path found was unique, which avoided the algorithm redundancy in temporal and spatial complexity. This algorithm, which has a high operating rate, is applicable to the big data of high-dimensional mass transactions as it is capable of avoiding the computing redundancy and finding all MFIs. In the end, a detailed experimental report on 10 open standard transaction sets was given in this study, including the big data calculation results of million-class transactions.


2010 ◽  
Vol 26-28 ◽  
pp. 118-122
Author(s):  
Chong Huan Xu ◽  
Chun Hua Ju

According to the features of data streams and combined sliding window, a new algorithm A-MFI which is based on self-adjusting and orderly-compound policy for mining maximal frequent itemsets in data stream is proposed. This algorithm which is based on basic window updates information from data stream flow fragments and scans the stream only once to gain and store it in frequent itemsets list when the data stream flows. The core idea of this algorithm: construct self-adjusting and orderly-compound FP-tree, use mixed subset pruning techniques to reduce the search space, merge nodes which has equal minsup in the same branch and compress to generate the orderly-compound FP-tree to avoid superset checking when mining maximal frequent itemsets. The experimental results show that the algorithm has higher efficiency in time and space, and also has good scalability.


Author(s):  
Padmanathan Anantharaman ◽  
H.V. Ramakrishan

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.


2014 ◽  
Vol 610 ◽  
pp. 291-295
Author(s):  
Qiang Wu ◽  
Ding We Wu ◽  
Qin Wang ◽  
Shao Min Wen ◽  
Rong Tu

In this paper, a novel algorithm for mining maximal frequent itemsets is presented, which has a pre-processing phase where a digraph is constructed. The digraph represents the frequent 2-itemsets which play an important role on the performance of data mining. Then the search for maximal frequent itemsets is done in the digraph. Experiments show that the proposed algorithm is efficient for all types of data.


Sign in / Sign up

Export Citation Format

Share Document