Efficient Implementations for UWEP Incremental Frequent Itemset Mining Algorithm

2021 ◽  
Vol 11 (1) ◽  
pp. 18-37
Author(s):  
Mehmet Bicer ◽  
Daniel Indictor ◽  
Ryan Yang ◽  
Xiaowen Zhang

Association rule mining is a common technique used in discovering interesting frequent patterns in data acquired in various application domains. The search space combinatorically explodes as the size of the data increases. Furthermore, the introduction of new data can invalidate old frequent patterns and introduce new ones. Hence, while finding the association rules efficiently is an important problem, maintaining and updating them is also crucial. Several algorithms have been introduced to find the association rules efficiently. One of them is Apriori. There are also algorithms written to update or maintain the existing association rules. Update with early pruning (UWEP) is one such algorithm. In this paper, the authors propose that in certain conditions it is preferable to use an incremental algorithm as opposed to the classic Apriori algorithm. They also propose new implementation techniques and improvements to the original UWEP paper in an algorithm we call UWEP2. These include the use of memorization and lazy evaluation to reduce scans of the dataset.

Author(s):  
Carson K.-S. Leung ◽  
Fan Jiang ◽  
Edson M. Dela Cruz ◽  
Vijay Sekar Elango

Collaborative filtering uses data mining and analysis to develop a system that helps users make appropriate decisions in real-life applications by removing redundant information and providing valuable to information users. Data mining aims to extract from data the implicit, previously unknown and potentially useful information such as association rules that reveals relationships between frequently co-occurring patterns in antecedent and consequent parts of association rules. This chapter presents an algorithm called CF-Miner for collaborative filtering with association rule miner. The CF-Miner algorithm first constructs bitwise data structures to capture important contents in the data. It then finds frequent patterns from the bitwise structures. Based on the mined frequent patterns, the algorithm forms association rules. Finally, the algorithm ranks the mined association rules to recommend appropriate merchandise products, goods or services to users. Evaluation results show the effectiveness of CF-Miner in using association rule mining in collaborative filtering.


2016 ◽  
Vol 113 (18) ◽  
pp. 4958-4963 ◽  
Author(s):  
Guoqi Qian ◽  
Calyampudi Radhakrishna Rao ◽  
Xiaoying Sun ◽  
Yuehua Wu

Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling–induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.


Data Mining ◽  
2013 ◽  
pp. 859-879
Author(s):  
Qin Ding ◽  
Gnanasekaran Sundarraj

Finding frequent patterns and association rules in large data has become a very important task in data mining. Various algorithms have been proposed to solve such problems, but most algorithms are only applicable to relational data. With the increasing use and popularity of XML representation, it is of importance yet challenging to find solutions to frequent pattern discovery and association rule mining of XML data. The challenge comes from the complexity of the structure in XML data. In this chapter, we provide an overview of the state-of-the-art research in content-based and structure-based mining of frequent patterns and association rules from XML data. We also discuss the challenges and issues, and provide our insight for solutions and future research directions.


Author(s):  
Qin Ding ◽  
Gnanasekaran Sundarraj

Finding frequent patterns and association rules in large data has become a very important task in data mining. Various algorithms have been proposed to solve such problems, but most algorithms are only applicable to relational data. With the increasing use and popularity of XML representation, it is of importance yet challenging to find solutions to frequent pattern discovery and association rule mining of XML data. The challenge comes from the complexity of the structure in XML data. In this chapter, we provide an overview of the state-of-the-art research in content-based and structure-based mining of frequent patterns and association rules from XML data. We also discuss the challenges and issues, and provide our insight for solutions and future research directions.


Author(s):  
Mustafa Bin Man ◽  
Wan Aezwani Wan Abu Bakar ◽  
Zailani Abdullah ◽  
Masita@Masila Abd Jalil ◽  
Tutut Herawan

<p class="Abstract">Data mining is the process of discovering knowledge and previously unknown pattern from large amount of data. The association rule mining (ARM) has been in trend where a new pattern analysis can be discovered to project for an important prediction about any issues. Since the first introduction of frequent itemset mining, it has received a major attention among researchers and various efficient and sophisticated algorithms have been proposed to do frequent itemset mining. Among the best-known algorithms are Apriori and FP-Growth. In this paper, we explore these algorithms and comparing their results in generating association rules based on benchmark dense datasets. The datasets are taken from frequent itemset mining data repository. The two algorithms are implemented in Rapid Miner 5.3.007 and the performance results are shown as comparison. FP-Growth is found to be better algorithm when encountering the support-confidence framework.</p>


Author(s):  
Manoj Kumar ◽  
Hemant Kumar Soni

Association rule mining is an iterative and interactive process of discovering valid, novel, useful, understandable and hidden associations from the massive database. The Colossal databases require powerful and intelligent tools for analysis and discovery of frequent patterns and association rules. Several researchers have proposed the many algorithms for generating item sets and association rules for discovery of frequent patterns, and minning of the association rules. These proposals are validated on static data. A dynamic database may introduce some new association rules, which may be interesting and helpful in taking better business decisions. In association rule mining, the validation of performance and cost of the existing algorithms on incremental data are less explored. Hence, there is a strong need of comprehensive study and in-depth analysis of the existing proposals of association rule mining. In this paper, the existing tree-based algorithms for incremental data mining are presented and compared on the baisis of number of scans, structure, size and type of database. It is concluded that the Can-Tree approach dominates the other algorithms such as FP-Tree, FUFP-Tree, FELINE Alorithm with CATS-Tree etc.This study also highlights some hot issues and future research directions. This study also points out that there is a strong need for devising an efficient and new algorithm for incremental data mining.


Author(s):  
Wenbin Zhou ◽  
Xuhui Xia ◽  
Zelin Zhang ◽  
Lei Wang

Abstract The potential relationship between service demands and remanufacturing services (RMS) is essential to make the decision of a RMS plan accurately and improve the efficiency and benefit. In the traditional association rule mining methods, a large number of candidate sets affect the mining efficiency, and the results are not easy for customers to understand. Therefore, a mining method based on binary particle swarm optimization ant colony algorithm to discover service demands and remanufacture services association rules is proposed. This method preprocesses the RMS records, converts them into a binary matrix, and uses the improved ant colony algorithm to mine the maximum frequent itemset. Because the particle swarm algorithm determines the initial pheromone concentration of the ant colony, it avoids the blindness of the ant colony, effectively enhances the searchability of the algorithm, and makes association rule mining faster and more accurate. Finally, a set of historical RMS record data of straightening machine is used to test the validity and feasibility of this method by extracting valid association rules to guide the design of RMS scheme for straightening machine parts.


2015 ◽  
Vol 4 (1) ◽  
pp. 156 ◽  
Author(s):  
Nada Hussein ◽  
Abdallah Alashqur ◽  
Bilal Sowan

<p>In this digital age, organizations have to deal with huge amounts of data, sometimes called Big Data. In recent years, the volume of data has increased substantially. Consequently, finding efficient and automated techniques for discovering useful patterns and relationships in the data becomes very important. In data mining, patterns and relationships can be represented in the form of association rules. Current techniques for discovering association rules rely on measures such as support for finding frequent patterns and confidence for finding association rules. A shortcoming of confidence is that it does not capture the correlation that exists between the left-hand side (LHS) and the right-hand side (RHS) of an association rule. On the other hand, the interestingness measure lift captures such as correlation in the sense that it tells us whether the LHS influences the RHS positively or negatively. Therefore, using Lift instead of confidence as a criteria for discovering association rules can be more effective. It also gives the user more choices in determining the kind of association rules to be discovered. This in turn helps to narrow down the search space and consequently, improves performance. In this paper, we describe a new approach for discovering association rules that is based on Lift and not based on confidence.</p>


2020 ◽  
Vol 54 (3) ◽  
pp. 365-382
Author(s):  
Praveen Kumar Gopagoni ◽  
Mohan Rao S K

PurposeAssociation rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation of the rules is quite high. On the other hand, the candidate rules generated using the traditional association rules mining face a huge challenge in terms of time and space, and the process is lengthy. In order to tackle the issues of the existing methods and to render the privacy rules, the paper proposes the grid-based privacy association rule mining.Design/methodology/approachThe primary intention of the research is to design and develop a distributed elephant herding optimization (EHO) for grid-based privacy association rule mining from the database. The proposed method of rule generation is processed as two steps: in the first step, the rules are generated using apriori algorithm, which is the effective association rule mining algorithm. In general, the extraction of the association rules from the input database is based on confidence and support that is replaced with new terms, such as probability-based confidence and holo-entropy. Thus, in the proposed model, the extraction of the association rules is based on probability-based confidence and holo-entropy. In the second step, the generated rules are given to the grid-based privacy rule mining, which produces privacy-dependent rules based on a novel optimization algorithm and grid-based fitness. The novel optimization algorithm is developed by integrating the distributed concept in EHO algorithm.FindingsThe experimentation of the method using the databases taken from the Frequent Itemset Mining Dataset Repository to prove the effectiveness of the distributed grid-based privacy association rule mining includes the retail, chess, T10I4D100K and T40I10D100K databases. The proposed method outperformed the existing methods through offering a higher degree of privacy and utility, and moreover, it is noted that the distributed nature of the association rule mining facilitates the parallel processing and generates the privacy rules without much computational burden. The rate of hiding capacity, the rate of information preservation and rate of the false rules generated for the proposed method are found to be 0.4468, 0.4488 and 0.0654, respectively, which is better compared with the existing rule mining methods.Originality/valueData mining is performed in a distributed manner through the grids that subdivide the input data, and the rules are framed using the apriori-based association mining, which is the modification of the standard apriori with the holo-entropy and probability-based confidence replacing the support and confidence in the standard apriori algorithm. The mined rules do not assure the privacy, and hence, the grid-based privacy rules are employed that utilize the adaptive elephant herding optimization (AEHO) for generating the privacy rules. The AEHO inherits the adaptive nature in the standard EHO, which renders the global optimal solution.


2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Zhicong Kou ◽  
Lifeng Xi

An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.


Sign in / Sign up

Export Citation Format

Share Document