frequent itemset
Recently Published Documents


TOTAL DOCUMENTS

691
(FIVE YEARS 200)

H-INDEX

27
(FIVE YEARS 5)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


2022 ◽  
Vol 54 (9) ◽  
pp. 1-35
Author(s):  
Lázaro Bustio-Martínez ◽  
René Cumplido ◽  
Martín Letras ◽  
Raudel Hernández-León ◽  
Claudia Feregrino-Uribe ◽  
...  

In data mining, Frequent Itemsets Mining is a technique used in several domains with notable results. However, the large volume of data in modern datasets increases the processing time of Frequent Itemset Mining algorithms, making them unsuitable for many real-world applications. Accordingly, proposing new methods for Frequent Itemset Mining to obtain frequent itemsets in a realistic amount of time is still an open problem. A successful alternative is to employ hardware acceleration using Graphics Processing Units (GPU) and Field Programmable Gates Arrays (FPGA). In this article, a comprehensive review of the state of the art of Frequent Itemsets Mining hardware acceleration is presented. Several approaches (FPGA and GPU based) were contrasted to show their weaknesses and strengths. This survey gathers the most relevant and the latest research efforts for improving the performance of Frequent Itemsets Mining regarding algorithms advances and modern development platforms. Furthermore, this survey organizes the current research on Frequent Itemsets Mining from the hardware perspective considering the source of the data, the development platform, and the baseline algorithm.


Sebatik ◽  
2022 ◽  
Vol 26 (1) ◽  
Author(s):  
Irwan Adji Darmawan ◽  
Muhammad Fakhri Randy ◽  
Imam Yunianto ◽  
Muhamad Malik Mutoffar ◽  
M Tio Putra Salis

Penyandang Masalah Kesejahteraan Sosial (PMKS) menjadi satu dari sekian masalah yang terdapat di daerah perkotaan, sebab dapat mengganggu pembangunan kota, ketertiban umum, keamanan dan stabilitas. Sejauh ini langkah yang dilakukan sementara masih terfokus dengan cara penanganan PMKS, masih belum mengarah untuk mencegah. Menentukan pola golongan PMKS merupakan salah satu cara yang dapat dilakukan. Algoritma Apriori memiliki fungsi untuk membantu menemukan pola yang terdapat pada data (frequent pattern mining) untuk menentukan frequent itemset yang menggunakan metode Association Rule dalam data mining. Dalam penghitungan secara manual yang dilakukan maka didapat pola kombinasi antara lain 3 rules yang memiliki nilai minimum support 15% dengan confidence tertinggi 100% menggunakan Algoritma Apriori. Dalam menguji Algoritma Apriori digunakan aplikasi RapidMiner. RapidMiner merupakan satu dari beberapa software pengolah data mining, misalnya menganalisis teks, mengekstrak pola data set kemudian dikombinasikan menggunakan metode statistik, database, dan kecerdasan buatan agar didapat informasi yang tinggi berasal dari olahan data. Hasil yang didapat dari pengujian perbandingan pola antar golongan PMKS. Dari pengujian menggunakan aplikasi RapidMiner dan penghitungan secara manual Algoritma Apriori, maka disimpulkan dengan kriteria pengujian, bahwa pola (rules) golongan dengan nilai confidence (c) penghitungan manual Algoritma Apriori dapat dibilang tidak mendekati hasil pengujian aplikasi RapidMiner, maka dapat dikatakan tingkat keakuratan pengujian rencah, hanya 37,5%.


2022 ◽  
Vol 1 ◽  
Author(s):  
Agostinetto Giulia ◽  
Sandionigi Anna ◽  
Bruno Antonia ◽  
Pescini Dario ◽  
Casiraghi Maurizio

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Guoliang Si ◽  
Hengyi Lv ◽  
Hangfei Yuan ◽  
Dan Xie ◽  
Ce Peng

With the rapid development of Internet technology, millions of small, medium, and microenterprises are using Internet recruitment platforms to host their recruitment information. They have different job requirements and benefits positions. It is important to understand them for job seekers when choosing a position. Existing Internet recruitment platforms do not provide a detailed analysis of positions and visual methods for multidimensional matching of positions and job applicants. Candidates need to spend a lot of energy to screen out suitable positions. In this paper, we propose an efficient interpretable visualization method of multidimensional structural data matching based on job seekers and positions. First, we extract the keywords of the job seeker’s ability and benefits based on personal information, and we generate a job seeker ability table and a job seeker demand table. After that, we calculate the degree of the support, confidence, and promotion of each rule through the association rules generated by each frequent itemset of recruitment data to obtain the association rule table. We further explore the relationship between the skills required for the three types of positions based on the association rule. Finally, we use the regression method to build a salary forecasting model. On this basis, we predict the salary of job seekers based on the work experience, education, and work city provided by the job seeker. Simulation results show that our method has better performance on the job analysis and recommendation.


2021 ◽  
Vol 50 (4) ◽  
pp. 627-644
Author(s):  
Shariq Bashir ◽  
Daphne Teck Ching Lai

Approximate frequent itemsets (AFI) mining from noisy databases are computationally more expensive than traditional frequent itemset mining. This is because the AFI mining algorithms generate large number of candidate itemsets. This article proposes an algorithm to mine AFIs using pattern growth approach. The major contribution of the proposed approach is it mines core patterns and examines approximate conditions of candidate AFIs directly with single phase and two full scans of database. Related algorithms apply Apriori-based candidate generation and test approach and require multiple phases to obtain complete AFIs. First phase generates core patterns, and second phase examines approximate conditions of core patterns. Specifically, the article proposes novel techniques that how to map transactions on approximate FP-tree, and how to mine AFIs from the conditional patterns of approximate FP-tree. The approximate FP-tree maps transactions on shared branches when the transactions share a similar set of items. This reduces the size of databases and helps to efficiently compute the approximate conditions of candidate itemsets. We compare the performance of our algorithm with the state of the art AFI mining algorithms on benchmark databases. The experiments are analyzed by comparing the processing time of algorithms and scalability of algorithms on varying database size and transaction length. The results show pattern growth approach mines AFIs in less processing time than related Apriori-based algorithms.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Erna Hikmawati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

AbstractAssociation rule mining is a technique that is widely used in data mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. Before the rule is formed, it must be determined in advance which items will be involved or called the frequent itemset. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, user determines the minimum support value randomly. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. It causes a lot of memory and time consumption. This is because the rule formation process is repeated until it finds the desired number of rules. The value of minimum support in the adaptive support model is determined based on the average and total number of items in each transaction, as well as their support values. Furthermore, the proposed method also uses certain criteria as thresholds, therefore, the resulting rules are in accordance with user needs. The minimum support value in the proposed method is obtained from the average utility value divided by the total existing transactions. Experiments were carried out on 8 specific datasets to determine the association rules using different dataset characteristics. The trial of the proposed adaptive support method uses 2 basic algorithms in the association rule, namely Apriori and Fpgrowth. The test is carried out repeatedly to determine the highest and lowest minimum support values. The result showed that 6 out of 8 datasets produced minimum and maximum support values for the apriori and fpgrowth algorithms. This means that the value of the proposed adaptive support has the ability to generate a rule when viewed from the quality as adaptive support produces at a lift ratio value of > 1. The dataset characteristics obtained from the experimental results can be used as a factor to determine the minimum threshold value.


Sign in / Sign up

Export Citation Format

Share Document