scholarly journals Comparative evaluation of pattern mining techniques: an empirical study

Author(s):  
Anindita Borah ◽  
Bhabesh Nath

Abstract Pattern mining has emerged as a compelling field of data mining over the years. Literature has bestowed ample endeavors in this field of research ranging from frequent pattern mining to rare pattern mining. A precise and impartial analysis of the existing pattern mining techniques has therefore become essential to widen the scope of data analysis using the notion of pattern mining. This paper is therefore an attempt to provide a comparative scrutiny of the fundamental algorithms in the field of pattern mining through performance analysis based on several decisive parameters. The paper provides a structural classification of the widely referenced techniques in four pattern mining categories: frequent, maximal frequent, closed frequent and rare. It provides an analytical comparison of these techniques based on computational time and memory consumption using benchmark real and synthetic data sets. The results illustrate that tree based approaches perform exceptionally well over level wise approaches in case of dense data sets for all the categories. However, for sparse data sets, level wise approaches performed better than the former ones. This study has been carried out with an aim to analyze the pros and cons of the well known pattern mining techniques under different categories. Through this empirical study, an endeavor has been made to enable the researchers identify some fruitful and promising research directions in one of the most remarkable area of research, pattern mining.

2020 ◽  
Vol 53 (8) ◽  
pp. 5747-5788
Author(s):  
Julian Hatwell ◽  
Mohamed Medhat Gaber ◽  
R. Muhammad Atif Azad

Abstract Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting.


2011 ◽  
pp. 32-56
Author(s):  
Osmar R. Zaïane ◽  
Mohammed El-Hajj

Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be leveraged to produce association rules, clusters, classifiers or contrast sets. This capability provides a strategic resource for decision support, and is most commonly used for market basket analysis. One challenge for frequent itemset mining is the potentially huge number of extracted patterns, which can eclipse the original database in size. In addition to increasing the cost of mining, this makes it more difficult for users to find the valuable patterns. Introducing constraints to the mining process helps mitigate both issues. Decision makers can restrict discovered patterns according to specified rules. By applying these restrictions as early as possible, the cost of mining can be constrained. For example, users may be interested in purchases whose total price exceeds $100, or whose items cost between $50 and $100. In cases of extremely large data sets, pushing constraints sequentially is not enough and parallelization becomes a must. However, specific design is needed to achieve sizes never reported before in the literature.


Author(s):  
Wirta Agustin ◽  
Yulya Muharmi

Homeless and beggars are one of the problems in urban areas because they can interfere public order, security, stability and urban development. The efforts conducted are still focused on how to manage homeless and beggars, but not for the prevention. One method that can be done to solve this problem is by determining the age pattern of homeless and beggars by implementing Algoritma Apriori. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtaines combination pattern of 11 rules with a minimum support value of 25% and the highest confidence value of 100%. The evaluation of the Apriori Algorithm implementation is using the RapidMiner. RapidMiner application is one of the data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. The test results showed a comparison of the age patterns of homeless and beggars who had the potential to become homeless and beggars from of testing with the RapidMiner application and manual calculations using the Apriori Algorithm.


2020 ◽  
Vol 7 (2) ◽  
pp. 229
Author(s):  
Wirta Agustin ◽  
Yulya Muharmi

<p class="Judul2">Gelandangan dan pengemis salah satu masalah yang ada di daerah perkotaan, karena dapat mengganggu ketertiban umum, keamanan, stabilitas dan pembangunan kota. Upaya yang dilakukan saat ini masih fokus pada cara penanganan gelandangan dan pengemis, belum untuk pencegahan. Salah satu cara yang bisa dilakukan adalah dengan menentukan pola usia gelandangan dan pengemis. Algoritma Apriori sebuah metode <em>Association Rule</em> dalam data mining untuk menentukan frequent itemset yang berfungsi membantu menemukan pola dalam sebuah data (<em>frequent pattern mining</em>). Perhitungan manual menggunakan algoritma apriori, menghasilkan pola kombinasi sebanyak 3 rules dengan nilai minimum <em>support</em> sebesar 30% dan nilai <em>confidence</em> tertinggi sebesar 100%. Pengujian penerapan Algoritma Apriori menggunakan aplikasi RapidMiner. RapidMiner salah satu software pengolahan data mining, diantaranya analisis teks, mengekstrak pola-pola dari data set dan mengkombinasikannya dengan metode statistika, kecerdasan buatan, dan database untuk mendapatkan informasi bermutu tinggi dari data yang diolah. Hasil pengujian menunjukkan perbandingan pola usia gelandangan dan pengemis yang berpotensi menjadi gelandangan dan pengemis. Berdasarkan hasil pengujian aplikasi RapidMiner dan hasil perhitungan manual Algoritma Apriori, dapat disimpulkan sesuai kriteria pengujian, bahiwa pola (rules) usia dan nilai confidence (c) hasil perhitungan manual Algoritma Apriori tidak mendekati nilai hasil pengujian menggunakan aplikasi RapidMiner, maka tingkat keakuratan pengujian rendah, yaitu 37.5 %.</p><p class="Judul2"> </p><p class="Judul2"><strong><em>Abstract </em></strong></p><p class="Judul2"><strong> </strong></p><p><em>Homeless and beggars are one of the problems in urban areas as they possibly disrupt public order, security, stability and urban development. The efforts conducted are still focusing on managing the existing homeless and beggars instead of preventing the potential ones. One of the methods used for solving this problem is Algoritma Apriori which determines the age pattern of homeless and beggars. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtains combination pattern of 3 rules with a minimum support value of 30% and the highest confidence value of 100%. These patterns were refences for the incharged department in precaution action of homeless and beggars arising numbers. Apriori Algorithm testing uses the RapidMiner application which is one of data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. Based on the results of the said testing, it can be concluded that the level of accuracy test is low, i.e. 37.5%.</em></p>


Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.


2011 ◽  
Vol 22 (8) ◽  
pp. 1749-1760
Author(s):  
Yu-Hong GUO ◽  
Yun-Hai TONG ◽  
Shi-Wei TANG ◽  
Leng-Dong WU

Sign in / Sign up

Export Citation Format

Share Document