Rare Association Rule Mining and Knowledge Discovery
Latest Publications


TOTAL DOCUMENTS

16
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781605667546, 9781605667553

Author(s):  
Richard A. O’Keefe ◽  
Nathan Rountree

In this chapter, the authors discuss the characteristics of data collected by the New Zealand Centre for Adverse Drug Reaction Monitoring (CARM) over a five-year period. The authors begin by noting the ways in which adverse reaction data are similar to market basket data, and the ways in which they are different. They go on to develop a model for estimating the amount of missing data in the dataset, and another to decide whether a drug is rare simply because it was only available for a short time. They also discuss the notion of “rarity” with respect to drugs, and with respect to reactions. Although the discussion is confined to the CARM data, the models and techniques presented here are useful to anyone who is about to embark on an association mining project, or who needs to interpret association rules in the context of a particular database.


Author(s):  
Marco-Antonio Balderas Cepeda

Association rule mining has been a highly active research field over the past decade. Extraction of frequency-related patterns has been applied to several domains. However, the way association rules are defined has limited people’s ability to obtain all the patterns of interest. In this chapter, the authors present an alternative approach that allows us to obtain new kinds of association rules that represent deviations from common behaviors. These new rules are called anomalous rules. To obtain such rules requires that we extract all the most frequent patterns together with certain extension patterns that may occur very infrequently. An approach that relies on anomalous rules has possible application in the areas of counterterrorism, fraud detection, pharmaceutical data analysis and network intrusion detection. They provide an adaption of measures of interest to our anomalous rule sets, and we propose an algorithm that can extract anomalous rules as well. Their experiments with benchmark and real-life datasets suggest that the set of anomalous rules is smaller than the set of association rules. Their work also provides evidence that our proposed approach can discover hidden patterns with good reliability.


Author(s):  
Khaled M. Elbassioni

The authors consider databases in which each attribute takes values from a partially ordered set (poset). This allows one to model a number of interesting scenarios arising in different applications, including quantitative databases, taxonomies, and databases in which each attribute is an interval representing the duration of a certain event occurring over time. A natural problem that arises in such circumstances is the following: given a database D and a threshold value t, find all collections of “generalizations” of attributes which are “supported” by less than t transactions from D. They call such collections infrequent elements. Due to monotonicity, they can reduce the output size by considering only minimal infrequent elements. We study the complexity of finding all minimal infrequent elements for some interesting classes of posets. The authors show how this problem can be applied to mining association rules in different types of databases, and to finding “sparse regions” or “holes” in quantitative data or in databases recording the time intervals during which a re-occurring event appears over time. Their main focus will be on these applications rather than on the correctness or analysis of the given algorithms.


Author(s):  
Huaifeng Zhang ◽  
Yanchang Zhao ◽  
Longbing Cao ◽  
Chengqi Zhang ◽  
Hans Bohlscheid

In this chapter, the authors propose a novel framework for rare class association rule mining. In each class association rule, the right-hand is a target class while the left-hand may contain one or more attributes. This algorithm is focused on the multiple imbalanced attributes on the left-hand. In the proposed framework, the rules with and without imbalanced attributes are processed in parallel. The rules without imbalanced attributes are mined through a standard algorithm while the rules with imbalanced attributes are mined based on newly defined measurements. Through simple transformation, these measurements can be in a uniform space so that only a few parameters need to be specified by user. In the case study, the proposed algorithm is applied in the social security field. Although some attributes are severely imbalanced, rules with a minority of imbalanced attributes have been mined efficiently.


Author(s):  
Ling Zhou ◽  
Stephen Yau

Association rule mining among frequent items has been extensively studied in data mining research. However, in recent years, there is an increasing demand for mining infrequent items (such as rare but expensive items). Since exploring interesting relationships among infrequent items has not been discussed much in the literature, in this chapter, the authors propose two simple, practical and effective schemes to mine association rules among rare items. Their algorithms can also be applied to frequent items with bounded length. Experiments are performed on the well-known IBM synthetic database. The authors’ schemes compare favorably to Apriori and FP-growth under the situation being evaluated. In addition, they explore quantitative association rule mining in transactional databases among infrequent items by associating quantities of items: some interesting examples are drawn to illustrate the significance of such mining.


Author(s):  
Rangsipan Marukatat

Association rule mining produces a large number of rules but many of them are usually redundant ones. When a data set contains infrequent items, the authors need to set the minimum support criterion very low; otherwise, these items will not be discovered. The downside is that it leads to even more redundancy. To deal with this dilemma, some proposed more efficient, and perhaps more complicated, rule generation methods. The others suggested using simple rule generation methods and rather focused on the post-pruning of the rules. This chapter follows the latter approach. The classic Apriori is employed for the rule generation. Their goal is to gain as much insight as possible about the domain. Therefore, the discovered rules are filtered by their semantics and structures. An individual rule is classified by its own semantic, or by how clear its domain description is. It can be labelled as one of the following: strongly meaningless, weakly meaningless, partially meaningful, and meaningful. In addition, multiple rules are compared. Rules with repetitive patterns are removed, while those conveying the most complete information are retained. They demonstrate an application of our techniques to a real case study, an analysis of traffic accidents in Nakorn Pathom, Thailand.


Author(s):  
Szymon Jaroszewicz

The paper presents an approach to mining patterns in numerical data without the need for discretization. The proposed method allows for discovery of arbitrary nonlinear relationships. The approach is based on finding a function of a set of attributes whose values are close to zero in the data. Intuitively such functions correspond to equations describing relationships between the attributes, but they are also able to capture more general classes of patterns. The approach is set in an association rule framework with analogues of itemsets and rules defined for numerical attributes. Furthermore, the user may include background knowledge in the form of a probabilistic model. Patterns which are already correctly predicted by the model will not be considered interesting. Interesting patterns can then be used by the user to update the probabilistic model.


Author(s):  
Maybin Muyeba ◽  
M. Sulaiman Khan ◽  
Frans Coenen

A novel approach is presented for effectively mining weighted fuzzy association rules (ARs). The authors address the issue of invalidation of downward closure property (DCP) in weighted association rule mining where each item is assigned a weight according to its significance wrt some user defined criteria. Most works on weighted association rule mining do not address the downward closure property while some make assumptions to validate the property. This chapter generalizes the weighted association rule mining problem with binary and fuzzy attributes with weighted settings. Their methodology follows an Apriori approach but employs T-tree data structure to improve efficiency of counting itemsets. The authors’ approach avoids pre and post processing as opposed to most weighted association rule mining algorithms, thus eliminating the extra steps during rules generation. The chapter presents experimental results on both synthetic and real-data sets and a discussion on evaluating the proposed approach.


Author(s):  
Dong (Haoyuan) Li ◽  
Anne Laurent ◽  
Pascal Poncelet

As common criteria in data mining methods, the frequency-based interestingness measures provide a statistical view of the correlation in the data, such as sequential patterns. However, when the authors consider domain knowledge within the mining process, the unexpected information that contradicts existing knowledge on the data has never less importance than the regularly frequent information. For this purpose, the authors present the approach USER for mining unexpected sequential rules in sequence databases. They propose a belief-driven formalization of the unexpectedness contained in sequential data, with which we propose 3 forms of unexpected sequences. They further propose the notion of unexpected sequential patterns and implication rules for determining the structures and implications of the unexpectedness. The experimental results on various types of data sets show the usefulness and effectiveness of our approach.


Author(s):  
Markus Breitenbach ◽  
William Dieterich ◽  
Tim Brennan ◽  
Adrian Fan

In this chapter, the authors explore Area under Curve (AUC) as an error-metric suitable for imbalanced data, as well as survey methods of optimizing this metric directly. We also address the issue of cut-point thresholds for practical decision-making. The techniques will be illustrated by a study that examines predictive rule development and validation procedures for establishing risk levels for violent felony crimes committed when criminal offenders are released from prison in the USA. The “violent felony” category was selected as the key outcome since these crimes are a major public safety concern, have a low base-rate (around 7%), and represent the most extreme forms of violence. The authors compare the performance of different algorithms on the dataset and validate using survival analysis whether the risk scores produced by these techniques are computing reasonable estimates of the true risk.


Sign in / Sign up

Export Citation Format

Share Document