Comparative evaluation of pattern mining techniques: an empirical study

Complex & Intelligent Systems ◽

10.1007/s40747-020-00226-4 ◽

2020 ◽

Author(s):

Anindita Borah ◽

Bhabesh Nath

Keyword(s):

Empirical Study ◽

Pattern Mining ◽

Synthetic Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Computational Time ◽

Data Sets ◽

Pros And Cons ◽

Sparse Data Sets

Abstract Pattern mining has emerged as a compelling field of data mining over the years. Literature has bestowed ample endeavors in this field of research ranging from frequent pattern mining to rare pattern mining. A precise and impartial analysis of the existing pattern mining techniques has therefore become essential to widen the scope of data analysis using the notion of pattern mining. This paper is therefore an attempt to provide a comparative scrutiny of the fundamental algorithms in the field of pattern mining through performance analysis based on several decisive parameters. The paper provides a structural classification of the widely referenced techniques in four pattern mining categories: frequent, maximal frequent, closed frequent and rare. It provides an analytical comparison of these techniques based on computational time and memory consumption using benchmark real and synthetic data sets. The results illustrate that tree based approaches perform exceptionally well over level wise approaches in case of dense data sets for all the categories. However, for sparse data sets, level wise approaches performed better than the former ones. This study has been carried out with an aim to analyze the pros and cons of the well known pattern mining techniques under different categories. Through this empirical study, an endeavor has been made to enable the researchers identify some fruitful and promising research directions in one of the most remarkable area of research, pattern mining.

CHIRPS: Explaining random forest classification

Artificial Intelligence Review ◽

10.1007/s10462-020-09833-6 ◽

2020 ◽

Vol 53 (8) ◽

pp. 5747-5788

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R. Muhammad Atif Azad

Keyword(s):

Random Forest ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Training Data ◽

Frequent Pattern ◽

Data Sets ◽

Random Forest Classification ◽

Human In The Loop ◽

Forest Classification ◽

Unseen Data

Abstract Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting.

Bi-Directional Constraint Pushing in Frequent Pattern Mining

Data Mining Patterns ◽

10.4018/978-1-59904-162-9.ch002 ◽

2011 ◽

pp. 32-56

Author(s):

Osmar R. Zaïane ◽

Mohammed El-Hajj

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Large Data ◽

Frequent Itemset ◽

Frequent Pattern ◽

Frequent Itemset Mining ◽

Data Sets ◽

Itemset Mining ◽

Transactional Databases ◽

The Cost

Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be leveraged to produce association rules, clusters, classifiers or contrast sets. This capability provides a strategic resource for decision support, and is most commonly used for market basket analysis. One challenge for frequent itemset mining is the potentially huge number of extracted patterns, which can eclipse the original database in size. In addition to increasing the cost of mining, this makes it more difficult for users to find the valuable patterns. Introducing constraints to the mining process helps mitigate both issues. Decision makers can restrict discovered patterns according to specified rules. By applying these restrictions as early as possible, the cost of mining can be constrained. For example, users may be interested in purchases whose total price exceeds $100, or whose items cost between $50 and $100. In cases of extremely large data sets, pushing constraints sequentially is not enough and parallelization becomes a must. However, specific design is needed to achieve sizes never reported before in the literature.

Apriori Algorithm through RapidMiner for Age Patterns of Homeless and Beggars

Indonesian Journal of Artificial Intelligence and Data Mining ◽

10.24014/ijaidm.v1i2.5670 ◽

2018 ◽

Vol 1 (2) ◽

pp. 86

Author(s):

Wirta Agustin ◽

Yulya Muharmi

Keyword(s):

Data Mining ◽

Urban Areas ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Data Sets ◽

Apriori Algorithm ◽

Age Patterns ◽

Rule Method ◽

Algorithm Implementation

Homeless and beggars are one of the problems in urban areas because they can interfere public order, security, stability and urban development. The efforts conducted are still focused on how to manage homeless and beggars, but not for the prevention. One method that can be done to solve this problem is by determining the age pattern of homeless and beggars by implementing Algoritma Apriori. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtaines combination pattern of 11 rules with a minimum support value of 25% and the highest confidence value of 100%. The evaluation of the Apriori Algorithm implementation is using the RapidMiner. RapidMiner application is one of the data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. The test results showed a comparison of the age patterns of homeless and beggars who had the potential to become homeless and beggars from of testing with the RapidMiner application and manual calculations using the Apriori Algorithm.

Penerapan Data Mining Menggunakan Algoritma Apriori untuk Menentukan Pola Penyebab Gelandangan dan Pengemis

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020721376 ◽

2020 ◽

Vol 7 (2) ◽

pp. 229

Author(s):

Wirta Agustin ◽

Yulya Muharmi

Keyword(s):

Data Mining ◽

Association Rule ◽

Urban Areas ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

Gelandangan dan pengemis salah satu masalah yang ada di daerah perkotaan, karena dapat mengganggu ketertiban umum, keamanan, stabilitas dan pembangunan kota. Upaya yang dilakukan saat ini masih fokus pada cara penanganan gelandangan dan pengemis, belum untuk pencegahan. Salah satu cara yang bisa dilakukan adalah dengan menentukan pola usia gelandangan dan pengemis. Algoritma Apriori sebuah metode Association Rule dalam data mining untuk menentukan frequent itemset yang berfungsi membantu menemukan pola dalam sebuah data (frequent pattern mining). Perhitungan manual menggunakan algoritma apriori, menghasilkan pola kombinasi sebanyak 3 rules dengan nilai minimum support sebesar 30% dan nilai confidence tertinggi sebesar 100%. Pengujian penerapan Algoritma Apriori menggunakan aplikasi RapidMiner. RapidMiner salah satu software pengolahan data mining, diantaranya analisis teks, mengekstrak pola-pola dari data set dan mengkombinasikannya dengan metode statistika, kecerdasan buatan, dan database untuk mendapatkan informasi bermutu tinggi dari data yang diolah. Hasil pengujian menunjukkan perbandingan pola usia gelandangan dan pengemis yang berpotensi menjadi gelandangan dan pengemis. Berdasarkan hasil pengujian aplikasi RapidMiner dan hasil perhitungan manual Algoritma Apriori, dapat disimpulkan sesuai kriteria pengujian, bahiwa pola (rules) usia dan nilai confidence (c) hasil perhitungan manual Algoritma Apriori tidak mendekati nilai hasil pengujian menggunakan aplikasi RapidMiner, maka tingkat keakuratan pengujian rendah, yaitu 37.5 %. Abstract Homeless and beggars are one of the problems in urban areas as they possibly disrupt public order, security, stability and urban development. The efforts conducted are still focusing on managing the existing homeless and beggars instead of preventing the potential ones. One of the methods used for solving this problem is Algoritma Apriori which determines the age pattern of homeless and beggars. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtains combination pattern of 3 rules with a minimum support value of 30% and the highest confidence value of 100%. These patterns were refences for the incharged department in precaution action of homeless and beggars arising numbers. Apriori Algorithm testing uses the RapidMiner application which is one of data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. Based on the results of the said testing, it can be concluded that the level of accuracy test is low, i.e. 37.5%.

Optimized Frequent Pattern Mining for Classified Data Sets

International Journal of Computer Applications ◽

10.5120/504-821 ◽

2010 ◽

Vol 1 (27) ◽

pp. 25-39 ◽

Cited By ~ 1

Author(s):

A Raghunathan ◽

K Murugesan

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Data Sets

An Adaptive Data Distribution Through Tree Rules in Frequent Pattern Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit183894 ◽

2018 ◽

pp. 300-305

Keyword(s):

Information Sharing ◽

Pattern Mining ◽

Data Distribution ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

General Development ◽

Secure Information ◽

Evaluation Parameters ◽

Secure Information Sharing

Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.

Learning and Synchronized Privacy Preserving Frequent Pattern Mining

Journal of Software ◽

10.3724/sp.j.1001.2011.04000 ◽

2011 ◽

Vol 22 (8) ◽

pp. 1749-1760

Author(s):

Yu-Hong GUO ◽

Yun-Hai TONG ◽

Shi-Wei TANG ◽

Leng-Dong WU

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Privacy Preserving ◽

Frequent Pattern

RAKING: An Efficient K-Maximal Frequent Pattern Mining Algorithm on Uncertain Graph Database

Chinese Journal of Computers ◽

10.3724/sp.j.1016.2010.01387 ◽

2010 ◽

Vol 33 (8) ◽

pp. 1387-1395 ◽

Cited By ~ 4

Author(s):

Meng HAN ◽

Wei ZHANG ◽

Jian-Zhong LI

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Graph Database ◽

Uncertain Graph ◽

Mining Algorithm ◽

Maximal Frequent Pattern

Sliding window based weighted maximal frequent pattern mining over data streams

Expert Systems with Applications ◽

10.1016/j.eswa.2013.07.094 ◽

2014 ◽

Vol 41 (2) ◽

pp. 694-708 ◽

Cited By ~ 64

Author(s):

Gangin Lee ◽

Unil Yun ◽

Keun Ho Ryu

Keyword(s):

Data Streams ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Sliding Window ◽

Frequent Pattern ◽

Maximal Frequent Pattern

Deep learning frequent pattern mining on static semi structured data streams for improving fast speed and complex data streams

2021 7th International Conference on Optimization and Applications (ICOA) ◽

10.1109/icoa51614.2021.9442621 ◽

2021 ◽

Author(s):

G. Suseendran ◽

D. Balaganesh ◽

D. Akila ◽

Souvik Pal

Keyword(s):

Deep Learning ◽

Data Streams ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Structured Data ◽

Frequent Pattern ◽

Complex Data ◽

Fast Speed