Extraction of Association Rule Mining using Apriori algorithm with Wolf Search Optimisation in R Programming

Association rule mining produces a large number of rules but many of them are usually redundant ones. When a data set contains infrequent items, the authors need to set the minimum support criterion very low; otherwise, these items will not be discovered. The downside is that it leads to even more redundancy. To deal with this dilemma, some proposed more efficient, and perhaps more complicated, rule generation methods. The others suggested using simple rule generation methods and rather focused on the post-pruning of the rules. This chapter follows the latter approach. The classic Apriori is employed for the rule generation. Their goal is to gain as much insight as possible about the domain. Therefore, the discovered rules are filtered by their semantics and structures. An individual rule is classified by its own semantic, or by how clear its domain description is. It can be labelled as one of the following: strongly meaningless, weakly meaningless, partially meaningful, and meaningful. In addition, multiple rules are compared. Rules with repetitive patterns are removed, while those conveying the most complete information are retained. They demonstrate an application of our techniques to a real case study, an analysis of traffic accidents in Nakorn Pathom, Thailand.

Download Full-text

Reduction of Redundant Rules in Association Rule Mining-Based Bug Assignment

International Journal of Reliability Quality and Safety Engineering ◽

10.1142/s0218539317400058 ◽

2017 ◽

Vol 24 (06) ◽

pp. 1740005 ◽

Cited By ~ 3

Author(s):

Meera Sharma ◽

Abhishek Tandon ◽

Madhu Kumari ◽

V. B. Singh

Keyword(s):

Operating System ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Clustering Algorithm ◽

Large Data ◽

Software Project ◽

Rule Mining ◽

Data Set ◽

Bug Reports

Bug triaging is a process to decide what to do with newly coming bug reports. In this paper, we have mined association rules for the prediction of bug assignee of a newly reported bug using different bug attributes, namely, severity, priority, component and operating system. To deal with the problem of large data sets, we have taken subsets of data set by dividing the large data set using [Formula: see text]-means clustering algorithm. We have used an Apriori algorithm in MATLAB to generate association rules. We have extracted the association rules for top 5 assignees in each cluster. The proposed method has been empirically validated on 14,696 bug reports of Mozilla open source software project, namely, Seamonkey, Firefox and Bugzilla. In our approach, we observe that taking on these attributes (severity, priority, component and operating system) as antecedents, essential rules are more than redundant rules, whereas in [M. Sharma and V. B. Singh, Clustering-based association rule mining for bug assignee prediction, Int. J. Business Intell. Data Mining 11(2) (2017) 130–150.] essential rules are less than redundant rules in every cluster. The proposed method provides an improvement over the existing techniques for bug assignment problem.

Download Full-text

Improved Classification Techniques to Predict the Co-disease in Diabetic Mellitus Patients using Discretization and Apriori Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1434.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 730-733

Keyword(s):

Data Mining ◽

Association Rules ◽

Census Data ◽

Early Stage ◽

Research Work ◽

Numerical Data ◽

Medical Data ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.

Download Full-text

DISTRIBUTED MINING OF ASSOCIATION RULES BASED ON REDUCING THE SUPPORT THRESHOLD

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213008004321 ◽

2008 ◽

Vol 17 (06) ◽

pp. 1109-1129 ◽

Cited By ~ 5

Author(s):

BASILIS BOUTSINAS ◽

COSTAS SIOTOS ◽

ANTONIS GEROLIMATOS

Keyword(s):

Association Rules ◽

Association Rule ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Rule Mining ◽

Data Set ◽

Processing Power ◽

Support Threshold ◽

Empirical Tests

One of the most important data mining problems is learning association rules of the form "90% of the customers that purchase product x also purchase product y". Discovering association rules from huge volumes of data requires substantial processing power. In this paper we present an efficient distributed algorithm for mining association rules that reduces the time complexity in a magnitude that renders as suitable for scaling up to very large data sets. The proposed algorithm is based on partitioning the initial data set into subsets and processing each subset in parallel. The proposed algorithm can maintain the set of association rules that are extracted when applying an association rule mining algorithm to all the data, by reducing the support threshold during processing the subsets. The above are confirmed by empirical tests that we present and which also demonstrate the utility of the method.

Download Full-text

A Kernel Density Estimation Based Interestingness Measure for Association Rule Mining

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.20-23.389 ◽

2010 ◽

Vol 20-23 ◽

pp. 389-394

Author(s):

Zhi Feng Hao ◽

Rui Chu Cai ◽

Tang Wu ◽

Yi Yuan Zhou

Keyword(s):

Density Estimation ◽

Association Rules ◽

Kernel Density Estimation ◽

Association Rule ◽

Association Rule Mining ◽

Kernel Density ◽

Rule Mining ◽

Data Set ◽

Interestingness Measures ◽

Interestingness Measure

Association rules provide a concise statement of potentially useful information, and have been widely used in real applications. However, the usefulness of association rules highly depends on the interestingness measure which is used to select interesting rules from millions of candidates. In this study, a probability analysis of association rules is conducted, and a discrete kernel density estimation based interestingness measure is proposed accordingly. The new proposed interestingness measure makes the most of the information contained in the data set and obtains much lower falsely discovery rate than the existing interestingness measures. Experimental results show the effectiveness of the proposed interestingness measure.

Download Full-text

Mining Temporal Association Rules with Temporal Soft Sets

Journal of Mathematics ◽

10.1155/2021/7303720 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Xiaoyan Liu ◽

Feng Feng ◽

Qian Wang ◽

Ronald R. Yager ◽

Hamido Fujita ◽

...

Keyword(s):

Association Rules ◽

Association Rule ◽

Temporal Association ◽

Data Sets ◽

Rule Mining ◽

Soft Sets ◽

Data Set ◽

Temporal Association Rule ◽

Temporal Aspect ◽

Necessary And Sufficient

Traditional association rule extraction may run into some difficulties due to ignoring the temporal aspect of the collected data. Particularly, it happens in many cases that some item sets are frequent during specific time periods, although they are not frequent in the whole data set. In this study, we make an effort to enhance conventional rule mining by introducing temporal soft sets. We define temporal granulation mappings to induce granular structures for temporal transaction data. Using this notion, we define temporal soft sets and their Q -clip soft sets to establish a novel framework for mining temporal association rules. A number of useful characterizations and results are obtained, including a necessary and sufficient condition for fast identification of strong temporal association rules. By combining temporal soft sets with NegNodeset-based frequent item set mining techniques, we develop the negFIN-based soft temporal association rule mining (negFIN-STARM) method to extract strong temporal association rules. Numerical experiments are conducted on commonly used data sets to show the feasibility of our approach. Moreover, comparative analysis demonstrates that the newly proposed method achieves higher execution efficiency than three well-known approaches in the literature.

Download Full-text

Improved Classification Techniques to Predict the Co-disease in Diabetic Mellitus Patients using Discretization and Apriori Algorithm

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1434.0881119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 730-733

Keyword(s):

Data Mining ◽

Association Rules ◽

Census Data ◽

Early Stage ◽

Research Work ◽

Numerical Data ◽

Medical Data ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

The demand for data mining is now unavoidable in the medical industry due to its various applications and uses in predicting the diseases at the early stage. The methods available in the data mining theories are easy to extract the useful patterns and speed to recognize the task based outcomes. In data mining the classification models are really useful in building the classes for the medical data sets for future analysis in an accurate way. Besides these facilities, Association rules in data mining are a promising technique to find hidden patterns in a medical data set and have been successfully applied with market basket data, census data and financial data. Apriori algorithm, is considered to be a classic algorithm, is useful in mining frequent item sets on a database containing a large number of transactions and it also predicts the relevant association rules. Association rules capture the relationship of items that are present in data sets and when the data set contains continuous attributes, the existing algorithms may not work due to this, discretization can be applied to the association rules in order to find the relation between various patterns in data set. In this paper of our research, using Discretized Apriori the research work is done to predict the by-disease in people who are found with diabetic syndrome; also the rules extracted are analyzed. In the discretization step, numerical data is discretized and fed to the Apriori algorithm for better association rules to predict the diseases.

Download Full-text

Identification of Malignant Mesothelioma Risk Factors through Association Rule Mining

10.20944/preprints201911.0117.v1 ◽

2019 ◽

Author(s):

Talha Mahboob Alam

Keyword(s):

Risk Factors ◽

Association Rules ◽

Malignant Mesothelioma ◽

Exposure Duration ◽

Asbestos Exposure ◽

Rule Mining ◽

Data Set ◽

Duration Of Symptoms ◽

Factors Associated ◽

Irrelevant Attributes

Malignant mesothelioma is a rare proliferative cancer that develops in the thin layer of tissues surrounding the lungs. Malignant mesothelioma is associated with an extremely poor prognosis and the majority of patients do not show symptoms. The epidemiology of mesothelioma is important for the identification of disease. The primary aim of this study is to explore the risk factors associated with mesothelioma. The dataset consists of healthy and mesothelioma patients but only mesothelioma patients were selected for the identification of symptoms. The raw data set has been pre-processed and then the Apriori method was utilized for association rules with various configurations. The pre-processing task involved the removal of duplicated and irrelevant attributes, balanced the dataset, numerical to the nominal conversion of attributes in the dataset and creating the association rules in the dataset. Strong associations of disease’s factors; asbestos exposure, duration of asbestos exposure, duration of symptoms, erythrocyte sedimentation rate and Pleural to serum LDH ratio determined via Apriori algorithm. The identification of risk factors associated with mesothelioma may prevent patients from going into the high danger of the disease. This will also help to control the comorbidities associated with mesothelioma which are cardiovascular diseases, cancer-related emotional distress, diabetes, anemia, and hypothyroidism.

Download Full-text

Analysis of Data on Staff Turnover Using Association Rules and Predictive Techniques

Quality Innovation Prosperity ◽

10.12776/qip.v22i2.1122 ◽

2018 ◽

Vol 22 (2) ◽

pp. 82 ◽

Cited By ~ 1

Author(s):

Lenka Girmanova ◽

Zuzana Gašparová

Keyword(s):

Data Mining ◽

Decision Trees ◽

Association Rules ◽

Employee Turnover ◽

Quality Data ◽

Limiting Factor ◽

Data Set ◽

R Programming ◽

Correct Communication ◽

Evaluation Of Data

Purpose: The purpose of this paper is to present the results of an analysis and evaluation of data on employee turnover based on deep data mining using association rules and decision trees in a specific organisation.Methodology/Approach: For the analysis, we chose deep data mining methods, primarily a search for association rules using the Apriori algorithm in the R programming language. For the sake of supplementation and comparison of results, data were also analysed using the predictive decision trees method, applying the C5.0, rpart and ctree algorithms in the R program.Findings: The results of the analyses showed that observing the basic principles of correct communication from the beginning of an employment relationship, or during hiring, is justified. Communication and regular conversations between a superior and employees can help identify problems earlier, address them and reduce the number of people leaving the company. The results of the analysis helped the organisation to set measures to reduce the number of an employee leaving.Research Limitation/implication: A limiting factor in performing such analyses is the availability of quality data in the required quantity. Our most significant advantage when performing our analysis was that quality data were available. To create the final structure of the required data set, we used data from the organisation’s internal information systems.Originality/Value of paper: This contribution offers a new approach to analysing data on employee turnover, whose essence is that we need to find the most interesting and frequent correlations in a significant amount of data.

Download Full-text

Binary Particle Swarm Optimization-Based Association Rule Mining for Discovering Relationships between Machine Capabilities and Product Features

Mathematical Problems in Engineering ◽

10.1155/2018/2456010 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 2

Author(s):

Zhicong Kou ◽

Lifeng Xi

Keyword(s):

Particle Swarm Optimization ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Particle Swarm ◽

Performance Comparison ◽

Binary Particle Swarm Optimization ◽

Rule Mining ◽

Swarm Optimization ◽

Product Features

An effective data mining method to automatically extract association rules between manufacturing capabilities and product features from the available historical data is essential for an efficient and cost-effective product development and production. This paper proposes a new binary particle swarm optimization- (BPSO-) based association rule mining (BPSO-ARM) method for discovering the hidden relationships between machine capabilities and product features. In particular, BPSO-ARM does not need to predefine thresholds of minimum support and confidence, which improves its applicability in real-world industrial cases. Moreover, a novel overlapping measure indication is further proposed to eliminate those lower quality rules to further improve the applicability of BPSO-ARM. The effectiveness of BPSO-ARM is demonstrated on a benchmark case and an industrial case about the automotive part manufacturing. The performance comparison indicates that BPSO-ARM outperforms other regular methods (e.g., Apriori) for ARM. The experimental results indicate that BPSO-ARM is capable of discovering important association rules between machine capabilities and product features. This will help support planners and engineers for the new product design and manufacturing.

Download Full-text