SARM — Succinct Association Rule Mining: An Approach to Enhance Association Mining

A software system evolves as changes are made to accommodate new features and repair defects. Software components are frequently interdependent, so changes made to one component can result in changes having to be made to other components to ensure that the system remains consistent; this is called change propagation. Accurate detection of change propagation is essential for software maintenance, which can be aided by accurate prediction of change propagation. In this paper, we study change propagation in three leading open-source software products: Linux, FreeBSD, and Apache HTTP Server. We use association rules-based data-mining techniques to detect change-propagation rules from the product version history. These rules are evaluated with respect to different training data sets and different test data sets. We discuss the applicability of using association-rule mining for change propagation, and several related issues. We find that a challenging issue in association-rule mining, concept drift, exists in software systems. Concept drift complicates the task of change-propagation prediction and requires special approaches, different from currently-used techniques for predicting change propagation.

Download Full-text

An Improved Apriori Algorithm for Association Mining Between Physical Fitness Indices of College Students

International Journal of Emerging Technologies in Learning (iJET) ◽

10.3991/ijet.v16i09.22747 ◽

2021 ◽

Vol 16 (09) ◽

pp. 235

Author(s):

Tao Pan

Keyword(s):

College Students ◽

Physical Fitness ◽

Association Rule ◽

Association Rule Mining ◽

Vital Capacity ◽

Correlation Coefficients ◽

Association Mining ◽

Apriori Algorithm ◽

Rule Mining ◽

Hidden Correlations

The physical fitness of college students can be evaluated scientifically based on the data of physical education (PE). This paper firstly relies on the Apriori algorithm to mine the hidden correlations between the physical fitness indices from the PE data on college students, and identify the indices closely associated with the physical fitness of college students. Then, the Apriori algorithm was improved to reduce the time complexity of association rule mining. Based on the improved algorithm, it was learned that the correlation coefficients of several indices surpassed the minimum support of 0.2 and minimum confidence of 0.7, reflecting their important impacts on physical fitness. Thus, physical fitness of college students is significantly influenced by speed, endurance, flexibility, and vital capacity, but not greatly affected by height and weight. The research results provide an important guide for the test and curriculum designs of PE for college students.

Download Full-text

Itemset Representation and Mining the Rules for Huntington’s Dataset

Emerging Science Journal ◽

10.28991/esj-2021-01284 ◽

2021 ◽

Vol 5 (3) ◽

pp. 380-391

Author(s):

Carynthia Kharkongor ◽

Bhabesh Nath

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Health Management ◽

Early Stage ◽

Cost Effective ◽

Frequent Itemsets ◽

Association Mining ◽

Apriori Algorithm ◽

Rule Mining ◽

Management Domain

Association rule mining does not restrict to market basket application but it is also employed in many applications such as health, industrial, network domain and etc. In this paper, an association mining algorithm is applied to the health management domain. It helps in the decision making by producing the rules for the early detection of the disease. By checking the personal details and symptoms of the patient, association rule mining will help in prediction and diagnosing the disease at an early stage. The dataset used in this experiment is the Huntington Disease (HD) dataset, which is one of the rare diseases. The dataset needs to be stored in the memory for the computation and generation of rules. Storing the items in the memory will take 4 bytes if the array data structure is used. Furthermore, if the dataset is very large, storing each and every detail in the memory becomes speculative. It is also not cost-effective and consumes a lot of resources. One of the solutions is to present the itemset in such a way that the memory consumed is concise. The items are represented using the set representation that takes less time and memory as compared to the traditional methods. The dataset is mine using the Apriori Algorithm which produces only those itemsets which are more frequent or have a high probability of occurrence. The algorithm gives a prior knowledge of the frequent itemsets. Then, the rules will be generated from these frequent itemsets. The memory and time consumption using the set representation is compared with the array representation of itemsets. Doi: 10.28991/esj-2021-01284 Full Text: PDF

Download Full-text

Distributed elephant herding optimization for grid-based privacy association rule mining

Data Technologies and Applications ◽

10.1108/dta-07-2019-0104 ◽

2020 ◽

Vol 54 (3) ◽

pp. 365-382

Author(s):

Praveen Kumar Gopagoni ◽

Mohan Rao S K

Keyword(s):

Association Rules ◽

Optimization Algorithm ◽

Association Rule ◽

Association Rule Mining ◽

Frequent Itemset ◽

Association Mining ◽

Apriori Algorithm ◽

Rule Mining ◽

Content Type ◽

Grid Based

PurposeAssociation rule mining generates the patterns and correlations from the database, which requires large scanning time, and the cost of computation associated with the generation of the rules is quite high. On the other hand, the candidate rules generated using the traditional association rules mining face a huge challenge in terms of time and space, and the process is lengthy. In order to tackle the issues of the existing methods and to render the privacy rules, the paper proposes the grid-based privacy association rule mining.Design/methodology/approachThe primary intention of the research is to design and develop a distributed elephant herding optimization (EHO) for grid-based privacy association rule mining from the database. The proposed method of rule generation is processed as two steps: in the first step, the rules are generated using apriori algorithm, which is the effective association rule mining algorithm. In general, the extraction of the association rules from the input database is based on confidence and support that is replaced with new terms, such as probability-based confidence and holo-entropy. Thus, in the proposed model, the extraction of the association rules is based on probability-based confidence and holo-entropy. In the second step, the generated rules are given to the grid-based privacy rule mining, which produces privacy-dependent rules based on a novel optimization algorithm and grid-based fitness. The novel optimization algorithm is developed by integrating the distributed concept in EHO algorithm.FindingsThe experimentation of the method using the databases taken from the Frequent Itemset Mining Dataset Repository to prove the effectiveness of the distributed grid-based privacy association rule mining includes the retail, chess, T10I4D100K and T40I10D100K databases. The proposed method outperformed the existing methods through offering a higher degree of privacy and utility, and moreover, it is noted that the distributed nature of the association rule mining facilitates the parallel processing and generates the privacy rules without much computational burden. The rate of hiding capacity, the rate of information preservation and rate of the false rules generated for the proposed method are found to be 0.4468, 0.4488 and 0.0654, respectively, which is better compared with the existing rule mining methods.Originality/valueData mining is performed in a distributed manner through the grids that subdivide the input data, and the rules are framed using the apriori-based association mining, which is the modification of the standard apriori with the holo-entropy and probability-based confidence replacing the support and confidence in the standard apriori algorithm. The mined rules do not assure the privacy, and hence, the grid-based privacy rules are employed that utilize the adaptive elephant herding optimization (AEHO) for generating the privacy rules. The AEHO inherits the adaptive nature in the standard EHO, which renders the global optimal solution.

Download Full-text

A Novel Market Basket Analysis Using Adaptive Association Rule Mining Algorithm

International Journal of Scientific Research ◽

10.15373/22778179/sep2012/9 ◽

2012 ◽

Vol 1 (4) ◽

pp. 25-28

Author(s):

M.Dhanabhakyam M.Dhanabhakyam ◽

◽

Dr.M.Punithavalli Dr.M.Punithavalli

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Market Basket Analysis ◽

Rule Mining ◽

Market Basket ◽

Mining Algorithm

Download Full-text

Study of Various Parallel Implementations of Association Rule Mining Algorithm

American Journal Of Advanced Computing ◽

10.15864/ajac.v2i1.94 ◽

2015 ◽

Vol 2 (1) ◽

Author(s):

Sarbani Dasgupta

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Rule Mining ◽

Mining Algorithm ◽

Parallel Implementations

Download Full-text

Prediksi Code Defect Perangkat Lunak Dengan Metode Association Rule Mining dan Cumulative Support Thresholds

Jurnal Buana Informatika ◽

10.24002/jbi.v6i2.408 ◽

2015 ◽

Vol 6 (2) ◽

Author(s):

Rizal Setya Perdana ◽

Umi Laili Yuhana

Keyword(s):

Association Rule ◽

Association Rule Mining ◽

Rule Mining ◽

Program Code

Kualitas perangkat lunak merupakan salah satu penelitian pada bidangrekayasa perangkat lunak yang memiliki peranan yang cukup besar dalamterbangunnya sistem perangkat lunak yang berkualitas baik. Prediksi defectperangkat lunak yang disebabkan karena terdapat penyimpangan dari prosesspesifikasi atau sesuatu yang mungkin menyebabkan kegagalan dalam operasionaltelah lebih dari 30 tahun menjadi topik riset penelitian. Makalah ini akandifokuskan pada prediksi defect yang terjadi pada kode program (code defect).Metode penanganan permasalahan defect pada kode program akan memanfaatkanpola-pola kode perangkat lunak yang berpotensi menimbulkan defect pada data setNASA untuk memprediksi defect. Metode yang digunakan dalam pencarian polaadalah memanfaatkan Association Rule Mining dengan Cumulative SupportThresholds yang secara otomatis menghasilkan nilai support dan nilai confidencepaling optimal tanpa membutuhkan masukan dari pengguna. Hasil pengujian darihasil pemrediksian defect kode perangkat lunak secara otomatis memiliki nilaiakurasi 82,35%.

Download Full-text