Reasoning about Frequent Patterns with Negation

Author(s):  
Marzena Kryszkiewicz

Discovering of frequent patterns in large databases is an important data mining problem. The problem was introduced in (Agrawal, Imielinski & Swami, 1993) for a sales transaction database. Frequent patterns were defined there as sets of items that are purchased together frequently. Frequent patterns are commonly used for building association rules. For example, an association rule may state that 80% of customers who buy fish also buy white wine. This rule is derivable from the fact that fish occurs in 5% of sales transactions and set {fish, white wine} occurs in 4% of transactions. Patterns and association rules can be generalized by admitting negation. A sample association rule with negation could state that 75% of customers who buy coke also buy chips and neither beer nor milk. The knowledge of this kind is important not only for sales managers, but also in medical areas (Tsumoto, 2002). Admitting negation in patterns usually results in an abundance of mined patterns, which makes analysis of the discovered knowledge infeasible. It is thus preferable to discover and store a possibly small fraction of patterns, from which one can derive all other significant patterns when required. In this chapter, we introduce first lossless representations of frequent patterns with negation.

Author(s):  
Marzena Kryszkiewicz

Discovering frequent patterns in large databases is an important data mining problem. The problem was introduced in (Agrawal, Imielinski, & Swami, 1993) for a sales transaction database. Frequent patterns were defined there as sets of items that are purchased together frequently. Frequent patterns are commonly used for building association rules. For example, an association rule may state that 80% of customers who buy fish also buy white wine. This rule is derivable from the fact that fish occurs in 5% of sales transactions and set {fish, white wine} occurs in 4% of transactions. Patterns and association rules can be generalized by admitting negation. A sample association rule with negation could state that 75% of customers who buy coke also buy chips and neither beer nor milk. The knowledge of this kind is important not only for sales managers, but also in medical areas (Tsumoto, 2002). Admitting negation in patterns usually results in an abundance of mined patterns, which makes analysis of the discovered knowledge infeasible. It is thus preferable to discover and store a possibly small fraction of patterns, from which one can derive all other significant patterns when required. In this chapter, we introduce first lossless representations of frequent patterns with negation.


2008 ◽  
pp. 2105-2120
Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


Author(s):  
Carson K.-S. Leung ◽  
Fan Jiang ◽  
Edson M. Dela Cruz ◽  
Vijay Sekar Elango

Collaborative filtering uses data mining and analysis to develop a system that helps users make appropriate decisions in real-life applications by removing redundant information and providing valuable to information users. Data mining aims to extract from data the implicit, previously unknown and potentially useful information such as association rules that reveals relationships between frequently co-occurring patterns in antecedent and consequent parts of association rules. This chapter presents an algorithm called CF-Miner for collaborative filtering with association rule miner. The CF-Miner algorithm first constructs bitwise data structures to capture important contents in the data. It then finds frequent patterns from the bitwise structures. Based on the mined frequent patterns, the algorithm forms association rules. Finally, the algorithm ranks the mined association rules to recommend appropriate merchandise products, goods or services to users. Evaluation results show the effectiveness of CF-Miner in using association rule mining in collaborative filtering.


Author(s):  
Kesaraporn Techapichetvanich ◽  
Amitava Datta

Both visualization and data mining have become important tools in discovering hidden relationships in large data sets, and in extracting useful knowledge and information from large databases. Even though many algorithms for mining association rules have been researched extensively in the past decade, they do not incorporate users in the association-rule mining process. Most of these algorithms generate a large number of association rules, some of which are not practically interesting. This chapter presents a new technique that integrates visualization into the mining association rule process. Users can apply their knowledge and be involved in finding interesting association rules through interactive visualization, after obtaining visual feedback as the algorithm generates association rules. In addition, the users gain insight and deeper understanding of their data sets, as well as control over mining meaningful association rules.


2005 ◽  
Vol 1 (3) ◽  
pp. 129-135
Author(s):  
Jun Luo ◽  
Sanguthevar Rajasekaran

Association rules mining is an important data mining problem that has been studied extensively. In this paper, a simple but Fast algorithm for Intersecting attributes lists using hash Tables (FIT) is presented. FIT is designed for efficiently computing all the frequent itemsets in large databases. It deploys an idea similar to Eclat but has a much better computational performance than Eclat due to two reasons: 1) FIT makes fewer total number of comparisons for each intersection operation between two attributes lists, and 2) FIT significantly reduces the total number of intersection operations. Our experimental results demonstrate that the performance of FIT is much better than that of Eclat and Apriori algorithms.


Author(s):  
Suma B. ◽  
Shobha G.

<div>Association rule mining is a well-known data mining technique used for extracting hidden correlations between data items in large databases. In the majority of the situations, data mining results contain sensitive information about individuals and publishing such data will violate individual secrecy. The challenge of association rule mining is to preserve the confidentiality of sensitive rules when releasing the database to external parties. The association rule hiding technique conceals the knowledge extracted by the sensitive association rules by modifying the database. In this paper, we introduce a border-based algorithm for hiding sensitive association rules. The main purpose of this approach is to conceal the sensitive rule set while maintaining the utility of the database and association rule mining results at the highest level. The performance of the algorithm in terms of the side effects is demonstrated using experiments conducted on two real datasets. The results show that the information loss is minimized without sacrificing the accuracy. </div>


2009 ◽  
Vol 3 (4) ◽  
pp. 1-17 ◽  
Author(s):  
Madhu V. Ahluwalia ◽  
Aryya Gangopadhyay ◽  
Zhiyuan Chen

Association rule mining is an important data mining method that has been studied extensively by the academic community and has been applied in practice. In the context of association rule mining, the state-of-the-art in privacy preserving data mining provides solutions for categorical and Boolean association rules but not for quantitative association rules. This article fills this gap by describing a method based on discrete wavelet transform (DWT) to protect input data privacy while preserving data mining patterns for association rules. A comparison with an existing kd-tree based transform shows that the DWT-based method fares better in terms of efficiency, preserving patterns, and privacy.


Author(s):  
Manoj Kumar ◽  
Hemant Kumar Soni

Association rule mining is an iterative and interactive process of discovering valid, novel, useful, understandable and hidden associations from the massive database. The Colossal databases require powerful and intelligent tools for analysis and discovery of frequent patterns and association rules. Several researchers have proposed the many algorithms for generating item sets and association rules for discovery of frequent patterns, and minning of the association rules. These proposals are validated on static data. A dynamic database may introduce some new association rules, which may be interesting and helpful in taking better business decisions. In association rule mining, the validation of performance and cost of the existing algorithms on incremental data are less explored. Hence, there is a strong need of comprehensive study and in-depth analysis of the existing proposals of association rule mining. In this paper, the existing tree-based algorithms for incremental data mining are presented and compared on the baisis of number of scans, structure, size and type of database. It is concluded that the Can-Tree approach dominates the other algorithms such as FP-Tree, FUFP-Tree, FELINE Alorithm with CATS-Tree etc.This study also highlights some hot issues and future research directions. This study also points out that there is a strong need for devising an efficient and new algorithm for incremental data mining.


2015 ◽  
Vol 4 (1) ◽  
pp. 156 ◽  
Author(s):  
Nada Hussein ◽  
Abdallah Alashqur ◽  
Bilal Sowan

<p>In this digital age, organizations have to deal with huge amounts of data, sometimes called Big Data. In recent years, the volume of data has increased substantially. Consequently, finding efficient and automated techniques for discovering useful patterns and relationships in the data becomes very important. In data mining, patterns and relationships can be represented in the form of association rules. Current techniques for discovering association rules rely on measures such as support for finding frequent patterns and confidence for finding association rules. A shortcoming of confidence is that it does not capture the correlation that exists between the left-hand side (LHS) and the right-hand side (RHS) of an association rule. On the other hand, the interestingness measure lift captures such as correlation in the sense that it tells us whether the LHS influences the RHS positively or negatively. Therefore, using Lift instead of confidence as a criteria for discovering association rules can be more effective. It also gives the user more choices in determining the kind of association rules to be discovered. This in turn helps to narrow down the search space and consequently, improves performance. In this paper, we describe a new approach for discovering association rules that is based on Lift and not based on confidence.</p>


Author(s):  
Yacine Izza ◽  
Said Jabbour ◽  
Badran Raddaoui ◽  
Abdelahmid Boudane

While traditional data mining techniques have been used extensively for finding patterns in databases, they are not always suitable for incorporating user-specified constraints. To overcome this issue, CP and SAT based frameworks for modeling and solving pattern mining tasks have gained a considerable audience in recent years. However, a bottleneck for all these CP and SAT-based approaches is the encoding size which makes these algorithms inefficient for large databases. This paper introduces a practical SAT-based approach to discover efficiently (minimal non-redundant) association rules. First, we present a decomposition-based paradigm that splits the original transaction database into smaller and independent subsets. Then, we show that without producing too large formulas, our decomposition method allows independent mining evaluation on a multi-core machine, improving performance. Finally, an experimental evaluation shows that our method is fast and scale well compared with the existing CP approach even in the sequential case, while significantly reducing the gap with the best state-of-the-art specialized algorithm.


Sign in / Sign up

Export Citation Format

Share Document