Apriori Algorithm through RapidMiner for Age Patterns  of Homeless and Beggars

Homeless and beggars are one of the problems in urban areas because they can interfere public order, security, stability and urban development. The efforts conducted are still focused on how to manage homeless and beggars, but not for the prevention. One method that can be done to solve this problem is by determining the age pattern of homeless and beggars by implementing Algoritma Apriori. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtaines combination pattern of 11 rules with a minimum support value of 25% and the highest confidence value of 100%. The evaluation of the Apriori Algorithm implementation is using the RapidMiner. RapidMiner application is one of the data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. The test results showed a comparison of the age patterns of homeless and beggars who had the potential to become homeless and beggars from of testing with the RapidMiner application and manual calculations using the Apriori Algorithm.

Download Full-text

Penerapan Data Mining Menggunakan Algoritma Apriori untuk Menentukan Pola Penyebab Gelandangan dan Pengemis

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020721376 ◽

2020 ◽

Vol 7 (2) ◽

pp. 229

Author(s):

Wirta Agustin ◽

Yulya Muharmi

Keyword(s):

Data Mining ◽

Association Rule ◽

Urban Areas ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Data Sets ◽

Apriori Algorithm ◽

Data Set

Gelandangan dan pengemis salah satu masalah yang ada di daerah perkotaan, karena dapat mengganggu ketertiban umum, keamanan, stabilitas dan pembangunan kota. Upaya yang dilakukan saat ini masih fokus pada cara penanganan gelandangan dan pengemis, belum untuk pencegahan. Salah satu cara yang bisa dilakukan adalah dengan menentukan pola usia gelandangan dan pengemis. Algoritma Apriori sebuah metode Association Rule dalam data mining untuk menentukan frequent itemset yang berfungsi membantu menemukan pola dalam sebuah data (frequent pattern mining). Perhitungan manual menggunakan algoritma apriori, menghasilkan pola kombinasi sebanyak 3 rules dengan nilai minimum support sebesar 30% dan nilai confidence tertinggi sebesar 100%. Pengujian penerapan Algoritma Apriori menggunakan aplikasi RapidMiner. RapidMiner salah satu software pengolahan data mining, diantaranya analisis teks, mengekstrak pola-pola dari data set dan mengkombinasikannya dengan metode statistika, kecerdasan buatan, dan database untuk mendapatkan informasi bermutu tinggi dari data yang diolah. Hasil pengujian menunjukkan perbandingan pola usia gelandangan dan pengemis yang berpotensi menjadi gelandangan dan pengemis. Berdasarkan hasil pengujian aplikasi RapidMiner dan hasil perhitungan manual Algoritma Apriori, dapat disimpulkan sesuai kriteria pengujian, bahiwa pola (rules) usia dan nilai confidence (c) hasil perhitungan manual Algoritma Apriori tidak mendekati nilai hasil pengujian menggunakan aplikasi RapidMiner, maka tingkat keakuratan pengujian rendah, yaitu 37.5 %. Abstract Homeless and beggars are one of the problems in urban areas as they possibly disrupt public order, security, stability and urban development. The efforts conducted are still focusing on managing the existing homeless and beggars instead of preventing the potential ones. One of the methods used for solving this problem is Algoritma Apriori which determines the age pattern of homeless and beggars. Apriori Algorithm is an Association Rule method in data mining to determine frequent item set that serves to help in finding patterns in a data (frequent pattern mining). The manual calculation through Apriori Algorithm obtains combination pattern of 3 rules with a minimum support value of 30% and the highest confidence value of 100%. These patterns were refences for the incharged department in precaution action of homeless and beggars arising numbers. Apriori Algorithm testing uses the RapidMiner application which is one of data mining processing software, including text analysis, extracting patterns from data sets and combining them with statistical methods, artificial intelligence, and databases to obtain high quality information from processed data. Based on the results of the said testing, it can be concluded that the level of accuracy test is low, i.e. 37.5%.

Download Full-text

Research into the Algorithm of Frequent Pattern Mining Based on across Linker

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.195-196.984 ◽

2012 ◽

Vol 195-196 ◽

pp. 984-986

Author(s):

Ming Ru Zhao ◽

Yuan Sun ◽

Jian Guo ◽

Ping Ping Dong

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemsets ◽

Frequent Pattern ◽

Apriori Algorithm ◽

Important Data ◽

Classical Algorithm ◽

Frequent Itemsets Mining ◽

Mining Frequent Itemsets

Frequent itemsets mining is an important data mining task and a focused theme in data mining research. Apriori algorithm is one of the most important algorithm of mining frequent itemsets. However, the Apriori algorithm scans the database too many times, so its efficiency is relatively low. The paper has therefore conducted a research on the mining frequent itemsets algorithm based on a across linker. Through comparing with the classical algorithm, the improved algorithm has obvious advantages.

Download Full-text

An efficient apriori algorithm for frequent pattern mining using mapreduce in healthcare data

Bulletin of Electrical Engineering and Informatics ◽

10.11591/eei.v10i1.2096 ◽

2021 ◽

Vol 10 (1) ◽

pp. 390-403

Author(s):

M. Sornalakshmi ◽

S. Balamurali ◽

M. Venkatesulu ◽

M. Navaneetha Krishnan ◽

Lakshmana Kumar Ramasamy ◽

...

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Data Transfer ◽

Data Extraction ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Apriori Algorithm ◽

Healthcare Organizations ◽

Healthcare Data ◽

Manual Tasks

The development for data mining technology in healthcare is growing today as knowledge and data mining are a must for the medical sector. Healthcare organizations generate and gather large quantities of daily information. Use of IT allows for the automation of data mining and information that help to provide some interesting patterns which remove manual tasks and simple data extraction from electronic records, a process of electronic data transfer which secures medical records, saves lives and cuts the cost of medical care and enables early detection of infectious diseases. In this research paper an improved Apriori algorithm names Enhanced Parallel and Distributed Apriori (EPDA) is presented for the health care industry, based on the scalable environment known as Hadoop MapReduce. The main aim of the work proposed is to reduce the huge demands for resources and to reduce overhead communication when frequent data are extracted, through split-frequent data generated locally and the early removal of unusual data. The paper shows test results, whereby the EPDA performs in terms of the time and number of rules generated with a database of healthcare and different minimum support values.

Download Full-text

Clustering of Time Series Data

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch042 ◽

2011 ◽

pp. 258-263

Author(s):

Anne Denton

Keyword(s):

Data Mining ◽

Time Series ◽

Pattern Mining ◽

Time Series Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Series Data ◽

Science And Engineering ◽

Data Mining Algorithms ◽

Mining Algorithms

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.

Download Full-text

CHIRPS: Explaining random forest classification

Artificial Intelligence Review ◽

10.1007/s10462-020-09833-6 ◽

2020 ◽

Vol 53 (8) ◽

pp. 5747-5788

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R. Muhammad Atif Azad

Keyword(s):

Random Forest ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Training Data ◽

Frequent Pattern ◽

Data Sets ◽

Random Forest Classification ◽

Human In The Loop ◽

Forest Classification ◽

Unseen Data

Abstract Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting.

Download Full-text

Research of Data Graph Mining Based on Telecommunication Customers

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.443.402 ◽

2013 ◽

Vol 443 ◽

pp. 402-406 ◽

Cited By ~ 1

Author(s):

Shang Gao ◽

Mei Mei Li

Keyword(s):

Data Mining ◽

Graph Mining ◽

Pattern Mining ◽

Rapid Development ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Practical Significance ◽

Research Progress ◽

Graph Data ◽

Data Graph

With the rapid development of the number of mobile phone users has accumulated a large number of graph data, graph data mining has gradually become a hot area of research. Traditional data such as clustering, classification, frequent pattern mining gradually extended to the field of graph data mining research. Introduced at this stage graph data mining technology research progress, summarizes the characteristics of the graphical data mining, practical significance, the main problem, and scenarios to discuss and forecast chart data, especially research on uncertain graph data become trends and hot spots.

Download Full-text

BIG DATA MINING FOR INTERESTING PATTERNS WITH MAP REDUCE TECHNIQUE

Asian Journal of Pharmaceutical and Clinical Research ◽

10.22159/ajpcr.2017.v10s1.19634 ◽

2017 ◽

Vol 10 (13) ◽

pp. 191

Author(s):

Nikhil Jamdar ◽

A Vijayalakshmi

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Uncertain Data ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Map Reduce ◽

Frequent Patterns ◽

Precise Data ◽

Big Data Mining ◽

Transactional Databases

There are many algorithms available in data mining to search interesting patterns from transactional databases of precise data. Frequent pattern mining is a technique to find the frequently occurred items in data mining. Most of the techniques used to find all the interesting patterns from a collection of precise data, where items occurred in each transaction are certainly known to the system. As well as in many real-time applications, users are interested in a tiny portion of large frequent patterns. So the proposed user constrained mining approach, will help to find frequent patterns in which user is interested. This approach will efficiently find user interested frequent patterns by applying user constraints on the collections of uncertain data. The user can specify their own interest in the form of constraints and uses the Map Reduce model to find uncertain frequent pattern that satisfy the user-specified constraints

Download Full-text

Distributed frequent hierarchical pattern mining for robust and efficient large-scale association discovery

10.32469/10355/63867 ◽

2017 ◽

Author(s):

◽

Michael Phinney

Keyword(s):

Data Mining ◽

Distributed Computing ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Generation Process ◽

Computing Environment ◽

Wide Range ◽

Mining Algorithms ◽

Hierarchical Pattern

Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.

Download Full-text

Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2015040106 ◽

2015 ◽

Vol 7 (2) ◽

pp. 77-99 ◽

Cited By ~ 4

Author(s):

Kun-Ming Yu ◽

Sheng-Hui Liu ◽

Li-Wei Zhou ◽

Shu-Hao Wu

Keyword(s):

Data Mining ◽

Load Balancing ◽

Pattern Mining ◽

High Efficiency ◽

Computation Time ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Parallel Data ◽

Parallel Data Mining ◽

Mining Methods

Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large database. Numerous methods have been proposed to solve this problem, such as Apriori and FP-growth. These are regarded as fundamental frequent pattern mining methods. In addition, parallel computing architectures, such as an on-cloud platform, a grid system, multi-core and GPU platform, have been popular in data mining. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures. In this study, multi-core architectures were used as well as two high efficiency load balancing parallel data mining methods based on the Apriori algorithm. The main goal of the proposed algorithms was to reduce the massive number of duplicate candidates generated using previous methods. This goal was achieved for, in this detailed experimental study the algorithms performed better than the previous methods. The experimental results demonstrated that the proposed algorithms had dramatically reduced computation time when using more threads. Moreover, the observations showed that the workload was equally balanced among the computing units.

Download Full-text

Bi-Directional Constraint Pushing in Frequent Pattern Mining

Data Mining Patterns ◽

10.4018/978-1-59904-162-9.ch002 ◽

2011 ◽

pp. 32-56

Author(s):

Osmar R. Zaïane ◽

Mohammed El-Hajj

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Large Data ◽

Frequent Itemset ◽

Frequent Pattern ◽

Frequent Itemset Mining ◽

Data Sets ◽

Itemset Mining ◽

Transactional Databases ◽

The Cost

Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be leveraged to produce association rules, clusters, classifiers or contrast sets. This capability provides a strategic resource for decision support, and is most commonly used for market basket analysis. One challenge for frequent itemset mining is the potentially huge number of extracted patterns, which can eclipse the original database in size. In addition to increasing the cost of mining, this makes it more difficult for users to find the valuable patterns. Introducing constraints to the mining process helps mitigate both issues. Decision makers can restrict discovered patterns according to specified rules. By applying these restrictions as early as possible, the cost of mining can be constrained. For example, users may be interested in purchases whose total price exceeds $100, or whose items cost between $50 and $100. In cases of extremely large data sets, pushing constraints sequentially is not enough and parallelization becomes a must. However, specific design is needed to achieve sizes never reported before in the literature.

Download Full-text