pattern mining Latest Research Papers

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3487046 ◽

2022 ◽

Vol 16 (3) ◽

pp. 1-26

Author(s):

Jerry Chun-Wei Lin ◽

Youcef Djenouri ◽

Gautam Srivastava ◽

Yuanfa Li ◽

Philip S. Yu

Keyword(s):

Large Scale ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Main Memory ◽

Frequent Itemset ◽

Sequential Pattern ◽

Sequential Patterns ◽

Speed Up ◽

Mapreduce Model ◽

High Utility

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.

RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern Mining

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3488380 ◽

2022 ◽

Vol 16 (4) ◽

pp. 1-33

Author(s):

Danlu Liu ◽

Yu Li ◽

William Baskett ◽

Dan Lin ◽

Chi-Ren Shyu

Keyword(s):

High Performance ◽

Large Scale ◽

Pattern Mining ◽

Tree Structure ◽

Research Initiative ◽

Speed Up ◽

Dynamic Tree ◽

Significant Patterns ◽

Search Approach ◽

Hierarchical Pattern

Risk patterns are crucial in biomedical research and have served as an important factor in precision health and disease prevention. Despite recent development in parallel and high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such as redundant candidate generation, inability to discover long significant patterns, and prolonged post pattern filtering. In this article, we propose a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are capable of efficiently analyzing a large volume of data and overcoming the limitations of previous works. The dynamic nature of the RHPTree avoids costly tree reconstruction for the iterative search process and dataset updates. We also introduce two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both UCI machine learning datasets and sampled datasets of the Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) datasets demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. Moreover, the proposed new tree structure is generic and applicable to other pattern mining problems.

HUFTI-SPM: high-utility and frequent time-interval sequential pattern mining from transactional databases

International Journal of Data Science and Analytics ◽

10.1007/s41060-021-00297-7 ◽

2022 ◽

Author(s):

Ritika ◽

Sunil Kumar Gupta

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Time Interval ◽

Transactional Databases ◽

High Utility

PENERAPAN DATA MINING MENGGUNAKAN ALGORITMA APRIORI UNTUK MENENTUKAN POLA GOLONGAN PENYANDANG MASALAH KESEJAHTERAAN SOSIAL

Sebatik ◽

10.46984/sebatik.v26i1.1622 ◽

2022 ◽

Vol 26 (1) ◽

Author(s):

Irwan Adji Darmawan ◽

Muhammad Fakhri Randy ◽

Imam Yunianto ◽

Muhamad Malik Mutoffar ◽

M Tio Putra Salis

Keyword(s):

Data Mining ◽

Association Rule ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Itemset ◽

Frequent Pattern ◽

Data Set ◽

Minimum Support

Penyandang Masalah Kesejahteraan Sosial (PMKS) menjadi satu dari sekian masalah yang terdapat di daerah perkotaan, sebab dapat mengganggu pembangunan kota, ketertiban umum, keamanan dan stabilitas. Sejauh ini langkah yang dilakukan sementara masih terfokus dengan cara penanganan PMKS, masih belum mengarah untuk mencegah. Menentukan pola golongan PMKS merupakan salah satu cara yang dapat dilakukan. Algoritma Apriori memiliki fungsi untuk membantu menemukan pola yang terdapat pada data (frequent pattern mining) untuk menentukan frequent itemset yang menggunakan metode Association Rule dalam data mining. Dalam penghitungan secara manual yang dilakukan maka didapat pola kombinasi antara lain 3 rules yang memiliki nilai minimum support 15% dengan confidence tertinggi 100% menggunakan Algoritma Apriori. Dalam menguji Algoritma Apriori digunakan aplikasi RapidMiner. RapidMiner merupakan satu dari beberapa software pengolah data mining, misalnya menganalisis teks, mengekstrak pola data set kemudian dikombinasikan menggunakan metode statistik, database, dan kecerdasan buatan agar didapat informasi yang tinggi berasal dari olahan data. Hasil yang didapat dari pengujian perbandingan pola antar golongan PMKS. Dari pengujian menggunakan aplikasi RapidMiner dan penghitungan secara manual Algoritma Apriori, maka disimpulkan dengan kriteria pengujian, bahwa pola (rules) golongan dengan nilai confidence (c) penghitungan manual Algoritma Apriori dapat dibilang tidak mendekati hasil pengujian aplikasi RapidMiner, maka dapat dikatakan tingkat keakuratan pengujian rencah, hanya 37,5%.

Rhythmus periodic frequent pattern mining without periodicity threshold

Journal of Ambient Intelligence and Humanized Computing ◽

10.1007/s12652-021-03617-8 ◽

2022 ◽

Author(s):

Subrata Datta ◽

Kalyani Mali ◽

Sourav Das ◽

Srijita Kundu ◽

Sayanta Harh

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Frequent Pattern

Mining Simple Path Traversal Patterns in Knowledge Graph

Journal of Web Engineering ◽

10.13052/jwe1540-9589.2128 ◽

2022 ◽

Author(s):

Feng Xiong ◽

Hongzhi Wang

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Divide And Conquer ◽

Simple Path ◽

Knowledge Graph ◽

Sequence Mining ◽

High Coverage ◽

List Structure ◽

Life Force ◽

Mining Algorithms

The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.

Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications

Frontiers in Bioinformatics ◽

10.3389/fbinf.2021.794547 ◽

2022 ◽

Vol 1 ◽

Author(s):

Agostinetto Giulia ◽

Sandionigi Anna ◽

Bruno Antonia ◽

Pescini Dario ◽

Casiraghi Maurizio

Keyword(s):

Machine Learning ◽

16S Rrna ◽

Association Rule ◽

Association Rule Mining ◽

Pattern Mining ◽

Microbial Community Composition ◽

Frequent Itemset ◽

Supervised Machine Learning ◽

Rule Mining ◽

Microbiome Data

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.

NetNMSP: Nonoverlapping maximal sequential pattern mining

Applied Intelligence ◽

10.1007/s10489-021-02912-3 ◽

2022 ◽

Author(s):

Yan Li ◽

Shuai Zhang ◽

Lei Guo ◽

Jing Liu ◽

Youxi Wu ◽

...

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern

Research on dynamic and secure storage of financial data based on cloud platform

Web Intelligence ◽

10.3233/web-210472 ◽

2022 ◽

pp. 1-12

Author(s):

Jingyi Li

Keyword(s):

Data Storage ◽

Pattern Mining ◽

Rough Set Theory ◽

Risk Model ◽

Frequent Pattern Mining ◽

Financial Data ◽

Frequent Pattern ◽

Data Types ◽

Computing Platform ◽

Secure Storage

Traditional financial data storage methods are prone to data leakage and narrow data coverage. Therefore, this paper proposes a dynamic and secure storage method of financial data based on cloud platform.In order to improve the ability of enterprise data management, the paper constructs a financial cloud computing platform, mining financial data by rough set theory, and analyzing the results of frequent pattern mining of financial data by fuzzy attribute characteristics.According to the granularity theory, the financial data is classified and processed, and the CSA cloud risk model is established to realize the dynamic and secure storage of financial data.The experimental results show that. The maximum data storage delay of this method is no more than 4.1 s, the maximum data leakage risk coefficient is no more than 0.5, the number of data types can reach 30, and the data storage coverage is improved.

Occupancy‐based utility pattern mining in dynamic environments of intelligent systems

International Journal of Intelligent Systems ◽

10.1002/int.22799 ◽

2022 ◽

Author(s):

Taewoong Ryu ◽

Unil Yun ◽

Chanhee Lee ◽

Jerry Chun‐Wei Lin ◽

Witold Pedrycz

Keyword(s):

Intelligent Systems ◽

Pattern Mining ◽

Dynamic Environments

pattern mining
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern Mining

HUFTI-SPM: high-utility and frequent time-interval sequential pattern mining from transactional databases

PENERAPAN DATA MINING MENGGUNAKAN ALGORITMA APRIORI UNTUK MENENTUKAN POLA GOLONGAN PENYANDANG MASALAH KESEJAHTERAAN SOSIAL

Rhythmus periodic frequent pattern mining without periodicity threshold

Mining Simple Path Traversal Patterns in Knowledge Graph

Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications

NetNMSP: Nonoverlapping maximal sequential pattern mining

Research on dynamic and secure storage of financial data based on cloud platform

Occupancy‐based utility pattern mining in dynamic environments of intelligent systems

Export Citation Format

pattern miningRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Scalable Mining of High-Utility Sequential Patterns With Three-Tier MapReduce Model

RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern Mining

HUFTI-SPM: high-utility and frequent time-interval sequential pattern mining from transactional databases

PENERAPAN DATA MINING MENGGUNAKAN ALGORITMA APRIORI UNTUK MENENTUKAN POLA GOLONGAN PENYANDANG MASALAH KESEJAHTERAAN SOSIAL

Rhythmus periodic frequent pattern mining without periodicity threshold

Mining Simple Path Traversal Patterns in Knowledge Graph

Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications

NetNMSP: Nonoverlapping maximal sequential pattern mining

Research on dynamic and secure storage of financial data based on cloud platform

Occupancy‐based utility pattern mining in dynamic environments of intelligent systems

pattern mining
Recently Published Documents