Filtering Association Rules by Their Semantics and Structures

Author(s):  
Rangsipan Marukatat

Association rule mining produces a large number of rules but many of them are usually redundant ones. When a data set contains infrequent items, the authors need to set the minimum support criterion very low; otherwise, these items will not be discovered. The downside is that it leads to even more redundancy. To deal with this dilemma, some proposed more efficient, and perhaps more complicated, rule generation methods. The others suggested using simple rule generation methods and rather focused on the post-pruning of the rules. This chapter follows the latter approach. The classic Apriori is employed for the rule generation. Their goal is to gain as much insight as possible about the domain. Therefore, the discovered rules are filtered by their semantics and structures. An individual rule is classified by its own semantic, or by how clear its domain description is. It can be labelled as one of the following: strongly meaningless, weakly meaningless, partially meaningful, and meaningful. In addition, multiple rules are compared. Rules with repetitive patterns are removed, while those conveying the most complete information are retained. They demonstrate an application of our techniques to a real case study, an analysis of traffic accidents in Nakorn Pathom, Thailand.

Author(s):  
Meera Sharma ◽  
Abhishek Tandon ◽  
Madhu Kumari ◽  
V. B. Singh

Bug triaging is a process to decide what to do with newly coming bug reports. In this paper, we have mined association rules for the prediction of bug assignee of a newly reported bug using different bug attributes, namely, severity, priority, component and operating system. To deal with the problem of large data sets, we have taken subsets of data set by dividing the large data set using [Formula: see text]-means clustering algorithm. We have used an Apriori algorithm in MATLAB to generate association rules. We have extracted the association rules for top 5 assignees in each cluster. The proposed method has been empirically validated on 14,696 bug reports of Mozilla open source software project, namely, Seamonkey, Firefox and Bugzilla. In our approach, we observe that taking on these attributes (severity, priority, component and operating system) as antecedents, essential rules are more than redundant rules, whereas in [M. Sharma and V. B. Singh, Clustering-based association rule mining for bug assignee prediction, Int. J. Business Intell. Data Mining 11(2) (2017) 130–150.] essential rules are less than redundant rules in every cluster. The proposed method provides an improvement over the existing techniques for bug assignment problem.


2010 ◽  
Vol 20-23 ◽  
pp. 389-394
Author(s):  
Zhi Feng Hao ◽  
Rui Chu Cai ◽  
Tang Wu ◽  
Yi Yuan Zhou

Association rules provide a concise statement of potentially useful information, and have been widely used in real applications. However, the usefulness of association rules highly depends on the interestingness measure which is used to select interesting rules from millions of candidates. In this study, a probability analysis of association rules is conducted, and a discrete kernel density estimation based interestingness measure is proposed accordingly. The new proposed interestingness measure makes the most of the information contained in the data set and obtains much lower falsely discovery rate than the existing interestingness measures. Experimental results show the effectiveness of the proposed interestingness measure.


2017 ◽  
Vol 13 (21) ◽  
pp. 37-44 ◽  
Author(s):  
Meenu Gupta ◽  
Vijender Kumar Solanki ◽  
Vijay Kumar Singh

Introduction: Traffic accidents are an undesirable burden on society. Every year around one million deaths and more than ten million injuries are reported due to traffic accidents. Hence, traffic accidents prevention measures must be taken to overcome the accident rate. Different countries have different geographical and environmental conditions and hence the accident factors diverge in each country. Traffic accident data analysis is very useful in revealing the factors that affect the accidents in different countries. This article was written in the year 2016 in the Institute of Technology & Science, Mohan Nagar, Ghaziabad, up, India. Methology: We propose a framework to utilize association rule mining (arm) for the severity classification of traffic accidents data obtained from police records in Mujjafarnagar district, Uttarpradesh, India. Results: The results certainly reveal some hidden factors which can be applied to understand the factors behind road accidentality in this region. Conclusions: The framework enables us to find three clusters from the data set. Each cluster represents a type of accident severity, i.e. fatal, major injury and minor/no injury. The association rules exposed different factors that are associated with road accidents in each category. The information extracted provides important information which can be employed to adapt preventive measures to overcome the accident severity in Muzzafarnagar district.


2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Zhicong Kou ◽  
Lifeng Xi

An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.


2021 ◽  
Vol 3 ◽  
Author(s):  
Oliver Haas ◽  
Luis Ignacio Lopera Gonzalez ◽  
Sonja Hofmann ◽  
Christoph Ostgathe ◽  
Andreas Maier ◽  
...  

We propose a novel knowledge extraction method based on Bayesian-inspired association rule mining to classify anxiety in heterogeneous, routinely collected data from 9,924 palliative patients. The method extracts association rules mined using lift and local support as selection criteria. The extracted rules are used to assess the maximum evidence supporting and rejecting anxiety for each patient in the test set. We evaluated the predictive accuracy by calculating the area under the receiver operating characteristic curve (AUC). The evaluation produced an AUC of 0.89 and a set of 55 atomic rules with one item in the premise and the conclusion, respectively. The selected rules include variables like pain, nausea, and various medications. Our method outperforms the previous state of the art (AUC = 0.72). We analyzed the relevance and novelty of the mined rules. Palliative experts were asked about the correlation between variables in the data set and anxiety. By comparing expert answers with the retrieved rules, we grouped rules into expected and unexpected ones and found several rules for which experts' opinions and the data-backed rules differ, most notably with the patients' sex. The proposed method offers a novel way to predict anxiety in palliative settings using routinely collected data with an explainable and effective model based on Bayesian-inspired association rule mining. The extracted rules give further insight into potential knowledge gaps in the palliative care field.


Semantic Web ◽  
2013 ◽  
pp. 76-96
Author(s):  
Luca Cagliero ◽  
Tania Cerquitelli ◽  
Paolo Garza

This paper presents a novel semi-automatic approach to construct conceptual ontologies over structured data by exploiting both the schema and content of the input dataset. It effectively combines two well-founded database and data mining techniques, i.e., functional dependency discovery and association rule mining, to support domain experts in the construction of meaningful ontologies, tailored to the analyzed data, by using Description Logic (DL). To this aim, functional dependencies are first discovered to highlight valuable conceptual relationships among attributes of the data schema (i.e., among concepts). The set of discovered correlations effectively support analysts in the assertion of the Tbox ontological statements (i.e., the statements involving shared data conceptualizations and their relationships). Then, the analyst-validated dependencies are exploited to drive the association rule mining process. Association rules represent relevant and hidden correlations among data content and they are used to provide valuable knowledge at the instance level. The pushing of functional dependency constraints into the rule mining process allows analysts to look into and exploit only the most significant data item recurrences in the assertion of the Abox ontological statements (i.e., the statements involving concept instances and their relationships).


Author(s):  
Carson Kai-Sang Leung

The problem of association rule mining was introduced in 1993 (Agrawal et al., 1993). Since then, it has been the subject of numerous studies. Most of these studies focused on either performance issues or functionality issues. The former considered how to compute association rules efficiently, whereas the latter considered what kinds of rules to compute. Examples of the former include the Apriori-based mining framework (Agrawal & Srikant, 1994), its performance enhancements (Park et al., 1997; Leung et al., 2002), and the tree-based mining framework (Han et al., 2000); examples of the latter include extensions of the initial notion of association rules to other rules such as dependence rules (Silverstein et al., 1998) and ratio rules (Korn et al., 1998). In general, most of these studies basically considered the data mining exercise in isolation. They did not explore how data mining can interact with the human user, which is a key component in the broader picture of knowledge discovery in databases. Hence, they provided little or no support for user focus. Consequently, the user usually needs to wait for a long period of time to get numerous association rules, out of which only a small fraction may be interesting to the user. In other words, the user often incurs a high computational cost that is disproportionate to what he wants to get. This calls for constraint-based association rule mining.


Sign in / Sign up

Export Citation Format

Share Document