Efficient Implementations for UWEP Incremental Frequent Itemset Mining Algorithm

Association rule mining is a common technique used in discovering interesting frequent patterns in data acquired in various application domains. The search space combinatorically explodes as the size of the data increases. Furthermore, the introduction of new data can invalidate old frequent patterns and introduce new ones. Hence, while finding the association rules efficiently is an important problem, maintaining and updating them is also crucial. Several algorithms have been introduced to find the association rules efficiently. One of them is Apriori. There are also algorithms written to update or maintain the existing association rules. Update with early pruning (UWEP) is one such algorithm. In this paper, the authors propose that in certain conditions it is preferable to use an incremental algorithm as opposed to the classic Apriori algorithm. They also propose new implementation techniques and improvements to the original UWEP paper in an algorithm we call UWEP2. These include the use of memorization and lazy evaluation to reduce scans of the dataset.

Download Full-text

Association Rule Mining in Collaborative Filtering

Collaborative Filtering Using Data Mining and Analysis - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0489-4.ch009 ◽

2017 ◽

pp. 159-179 ◽

Cited By ~ 8

Author(s):

Carson K.-S. Leung ◽

Fan Jiang ◽

Edson M. Dela Cruz ◽

Vijay Sekar Elango

Keyword(s):

Data Mining ◽

Collaborative Filtering ◽

Association Rules ◽

Data Structures ◽

Association Rule ◽

Association Rule Mining ◽

Real Life ◽

Frequent Patterns ◽

Rule Mining ◽

Association Rule Miner

Collaborative filtering uses data mining and analysis to develop a system that helps users make appropriate decisions in real-life applications by removing redundant information and providing valuable to information users. Data mining aims to extract from data the implicit, previously unknown and potentially useful information such as association rules that reveals relationships between frequently co-occurring patterns in antecedent and consequent parts of association rules. This chapter presents an algorithm called CF-Miner for collaborative filtering with association rule miner. The CF-Miner algorithm first constructs bitwise data structures to capture important contents in the data. It then finds frequent patterns from the bitwise structures. Based on the mined frequent patterns, the algorithm forms association rules. Finally, the algorithm ranks the mined association rules to recommend appropriate merchandise products, goods or services to users. Evaluation results show the effectiveness of CF-Miner in using association rule mining in collaborative filtering.

Download Full-text

Boosting association rule mining in large datasets via Gibbs sampling

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1604553113 ◽

2016 ◽

Vol 113 (18) ◽

pp. 4958-4963 ◽

Cited By ~ 5

Author(s):

Guoqi Qian ◽

Calyampudi Radhakrishna Rao ◽

Xiaoying Sun ◽

Yuehua Wu

Keyword(s):

Gibbs Sampling ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

General Rule ◽

Search Space ◽

Stochastic Search ◽

Importance Measure ◽

Rule Mining ◽

Ergodic Markov Chain

Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling–induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.

Download Full-text

Frequent Pattern Discovery and Association Rule Mining of XML Data

Data Mining ◽

10.4018/978-1-4666-2455-9.ch044 ◽

2013 ◽

pp. 859-879

Author(s):

Qin Ding ◽

Gnanasekaran Sundarraj

Keyword(s):

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Pattern Discovery ◽

Frequent Pattern ◽

Future Research ◽

Frequent Patterns ◽

Rule Mining ◽

Xml Data ◽

Art Research

Finding frequent patterns and association rules in large data has become a very important task in data mining. Various algorithms have been proposed to solve such problems, but most algorithms are only applicable to relational data. With the increasing use and popularity of XML representation, it is of importance yet challenging to find solutions to frequent pattern discovery and association rule mining of XML data. The challenge comes from the complexity of the structure in XML data. In this chapter, we provide an overview of the state-of-the-art research in content-based and structure-based mining of frequent patterns and association rules from XML data. We also discuss the challenges and issues, and provide our insight for solutions and future research directions.

Download Full-text

Frequent Pattern Discovery and Association Rule Mining of XML Data

Advances in Data Mining and Database Management - XML Data Mining ◽

10.4018/978-1-61350-356-0.ch011 ◽

2011 ◽

pp. 243-263 ◽

Cited By ~ 1

Author(s):

Qin Ding ◽

Gnanasekaran Sundarraj

Keyword(s):

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Pattern Discovery ◽

Frequent Pattern ◽

Future Research ◽

Frequent Patterns ◽

Rule Mining ◽

Xml Data ◽

Art Research

Download Full-text

Mining Association Rules: A Case Study on Benchmark Dense Data

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v3.i3.pp546-553 ◽

2016 ◽

Vol 3 (3) ◽

pp. 546 ◽

Cited By ~ 2

Author(s):

Mustafa Bin Man ◽

Wan Aezwani Wan Abu Bakar ◽

Zailani Abdullah ◽

Masita@Masila Abd Jalil ◽

Tutut Herawan

Keyword(s):

Association Rules ◽

Association Rule ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Data Repository ◽

Rule Mining ◽

Itemset Mining ◽

Major Attention ◽

Performance Results

<p class="Abstract">Data mining is the process of discovering knowledge and previously unknown pattern from large amount of data. The association rule mining (ARM) has been in trend where a new pattern analysis can be discovered to project for an important prediction about any issues. Since the first introduction of frequent itemset mining, it has received a major attention among researchers and various efficient and sophisticated algorithms have been proposed to do frequent itemset mining. Among the best-known algorithms are Apriori and FP-Growth. In this paper, we explore these algorithms and comparing their results in generating association rules based on benchmark dense datasets. The datasets are taken from frequent itemset mining data repository. The two algorithms are implemented in Rapid Miner 5.3.007 and the performance results are shown as comparison. FP-Growth is found to be better algorithm when encountering the support-confidence framework.</p>

Download Full-text

A Comparative Study of Tree-Based and Apriori-Based Approaches for Incremental Data Mining

International Journal of Engineering Research in Africa ◽

10.4028/www.scientific.net/jera.23.120 ◽

2016 ◽

Vol 23 ◽

pp. 120-130

Author(s):

Manoj Kumar ◽

Hemant Kumar Soni

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Future Research ◽

Frequent Patterns ◽

Rule Mining ◽

Business Decisions ◽

Depth Analysis ◽

Intelligent Tools

Association rule mining is an iterative and interactive process of discovering valid, novel, useful, understandable and hidden associations from the massive database. The Colossal databases require powerful and intelligent tools for analysis and discovery of frequent patterns and association rules. Several researchers have proposed the many algorithms for generating item sets and association rules for discovery of frequent patterns, and minning of the association rules. These proposals are validated on static data. A dynamic database may introduce some new association rules, which may be interesting and helpful in taking better business decisions. In association rule mining, the validation of performance and cost of the existing algorithms on incremental data are less explored. Hence, there is a strong need of comprehensive study and in-depth analysis of the existing proposals of association rule mining. In this paper, the existing tree-based algorithms for incremental data mining are presented and compared on the baisis of number of scans, structure, size and type of database. It is concluded that the Can-Tree approach dominates the other algorithms such as FP-Tree, FUFP-Tree, FELINE Alorithm with CATS-Tree etc.This study also highlights some hot issues and future research directions. This study also points out that there is a strong need for devising an efficient and new algorithm for incremental data mining.

Download Full-text

Association rules mining between service demands and remanufacturing services

Artificial intelligence for engineering design analysis and manufacturing ◽

10.1017/s0890060420000396 ◽

2020 ◽

pp. 1-11

Author(s):

Wenbin Zhou ◽

Xuhui Xia ◽

Zelin Zhang ◽

Lei Wang

Keyword(s):

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Ant Colony Algorithm ◽

Particle Swarm ◽

Frequent Itemset ◽

Ant Colony ◽

Particle Swarm Algorithm ◽

Binary Particle Swarm Optimization ◽

Rule Mining

Abstract The potential relationship between service demands and remanufacturing services (RMS) is essential to make the decision of a RMS plan accurately and improve the efficiency and benefit. In the traditional association rule mining methods, a large number of candidate sets affect the mining efficiency, and the results are not easy for customers to understand. Therefore, a mining method based on binary particle swarm optimization ant colony algorithm to discover service demands and remanufacture services association rules is proposed. This method preprocesses the RMS records, converts them into a binary matrix, and uses the improved ant colony algorithm to mine the maximum frequent itemset. Because the particle swarm algorithm determines the initial pheromone concentration of the ant colony, it avoids the blindness of the ant colony, effectively enhances the searchability of the algorithm, and makes association rule mining faster and more accurate. Finally, a set of historical RMS record data of straightening machine is used to test the validity and feasibility of this method by extracting valid association rules to guide the design of RMS scheme for straightening machine parts.

Download Full-text

Using the interestingness measure lift to generate association rules

Journal of Advanced Computer Science & Technology ◽

10.14419/jacst.v4i1.4398 ◽

2015 ◽

Vol 4 (1) ◽

pp. 156 ◽

Cited By ~ 4

Author(s):

Nada Hussein ◽

Abdallah Alashqur ◽

Bilal Sowan

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Search Space ◽

The Other ◽

Frequent Patterns ◽

New Approach ◽

Left Hand ◽

Interestingness Measure ◽

The Right

<p>In this digital age, organizations have to deal with huge amounts of data, sometimes called Big Data. In recent years, the volume of data has increased substantially. Consequently, finding efficient and automated techniques for discovering useful patterns and relationships in the data becomes very important. In data mining, patterns and relationships can be represented in the form of association rules. Current techniques for discovering association rules rely on measures such as support for finding frequent patterns and confidence for finding association rules. A shortcoming of confidence is that it does not capture the correlation that exists between the left-hand side (LHS) and the right-hand side (RHS) of an association rule. On the other hand, the interestingness measure lift captures such as correlation in the sense that it tells us whether the LHS influences the RHS positively or negatively. Therefore, using Lift instead of confidence as a criteria for discovering association rules can be more effective. It also gives the user more choices in determining the kind of association rules to be discovered. This in turn helps to narrow down the search space and consequently, improves performance. In this paper, we describe a new approach for discovering association rules that is based on Lift and not based on confidence.</p>

Download Full-text

Distributed elephant herding optimization for grid-based privacy association rule mining

Data Technologies and Applications ◽

10.1108/dta-07-2019-0104 ◽

2020 ◽

Vol 54 (3) ◽

pp. 365-382

Author(s):

Praveen Kumar Gopagoni ◽

Mohan Rao S K

Keyword(s):

Association Rules ◽

Optimization Algorithm ◽

Association Rule ◽

Association Rule Mining ◽

Frequent Itemset ◽

Association Mining ◽

Apriori Algorithm ◽

Rule Mining ◽

Content Type ◽

Grid Based

PurposeAssociation rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation of the rules is quite high. On the other hand, the candidate rules generated using the traditional association rules mining face a huge challenge in terms of time and space, and the process is lengthy. In order to tackle the issues of the existing methods and to render the privacy rules, the paper proposes the grid-based privacy association rule mining.Design/methodology/approachThe primary intention of the research is to design and develop a distributed elephant herding optimization (EHO) for grid-based privacy association rule mining from the database. The proposed method of rule generation is processed as two steps: in the first step, the rules are generated using apriori algorithm, which is the effective association rule mining algorithm. In general, the extraction of the association rules from the input database is based on confidence and support that is replaced with new terms, such as probability-based confidence and holo-entropy. Thus, in the proposed model, the extraction of the association rules is based on probability-based confidence and holo-entropy. In the second step, the generated rules are given to the grid-based privacy rule mining, which produces privacy-dependent rules based on a novel optimization algorithm and grid-based fitness. The novel optimization algorithm is developed by integrating the distributed concept in EHO algorithm.FindingsThe experimentation of the method using the databases taken from the Frequent Itemset Mining Dataset Repository to prove the effectiveness of the distributed grid-based privacy association rule mining includes the retail, chess, T10I4D100K and T40I10D100K databases. The proposed method outperformed the existing methods through offering a higher degree of privacy and utility, and moreover, it is noted that the distributed nature of the association rule mining facilitates the parallel processing and generates the privacy rules without much computational burden. The rate of hiding capacity, the rate of information preservation and rate of the false rules generated for the proposed method are found to be 0.4468, 0.4488 and 0.0654, respectively, which is better compared with the existing rule mining methods.Originality/valueData mining is performed in a distributed manner through the grids that subdivide the input data, and the rules are framed using the apriori-based association mining, which is the modification of the standard apriori with the holo-entropy and probability-based confidence replacing the support and confidence in the standard apriori algorithm. The mined rules do not assure the privacy, and hence, the grid-based privacy rules are employed that utilize the adaptive elephant herding optimization (AEHO) for generating the privacy rules. The AEHO inherits the adaptive nature in the standard EHO, which renders the global optimal solution.

Download Full-text

Binary Particle Swarm Optimization-Based Association Rule Mining for Discovering Relationships between Machine Capabilities and Product Features

Mathematical Problems in Engineering ◽

10.1155/2018/2456010 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 2

Author(s):

Zhicong Kou ◽

Lifeng Xi

Keyword(s):

Particle Swarm Optimization ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Particle Swarm ◽

Performance Comparison ◽

Binary Particle Swarm Optimization ◽

Rule Mining ◽

Swarm Optimization ◽

Product Features

An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.

Download Full-text