pattern mining
Recently Published Documents


TOTAL DOCUMENTS

2483
(FIVE YEARS 660)

H-INDEX

48
(FIVE YEARS 11)

2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


2022 ◽  
Vol 16 (4) ◽  
pp. 1-33
Author(s):  
Danlu Liu ◽  
Yu Li ◽  
William Baskett ◽  
Dan Lin ◽  
Chi-Ren Shyu

Risk patterns are crucial in biomedical research and have served as an important factor in precision health and disease prevention. Despite recent development in parallel and high-performance computing, existing risk pattern mining methods still struggle with problems caused by large-scale datasets, such as redundant candidate generation, inability to discover long significant patterns, and prolonged post pattern filtering. In this article, we propose a novel dynamic tree structure, Risk Hierarchical Pattern Tree (RHPTree), and a top-down search method, RHPSearch, which are capable of efficiently analyzing a large volume of data and overcoming the limitations of previous works. The dynamic nature of the RHPTree avoids costly tree reconstruction for the iterative search process and dataset updates. We also introduce two specialized search methods, the extended target search (RHPSearch-TS) and the parallel search approach (RHPSearch-SD), to further speed up the retrieval of certain items of interest. Experiments on both UCI machine learning datasets and sampled datasets of the Simons Foundation Autism Research Initiative (SFARI)—Simon’s Simplex Collection (SSC) datasets demonstrate that our method is not only faster but also more effective in identifying comprehensive long risk patterns than existing works. Moreover, the proposed new tree structure is generic and applicable to other pattern mining problems.


Sebatik ◽  
2022 ◽  
Vol 26 (1) ◽  
Author(s):  
Irwan Adji Darmawan ◽  
Muhammad Fakhri Randy ◽  
Imam Yunianto ◽  
Muhamad Malik Mutoffar ◽  
M Tio Putra Salis

Penyandang Masalah Kesejahteraan Sosial (PMKS) menjadi satu dari sekian masalah yang terdapat di daerah perkotaan, sebab dapat mengganggu pembangunan kota, ketertiban umum, keamanan dan stabilitas. Sejauh ini langkah yang dilakukan sementara masih terfokus dengan cara penanganan PMKS, masih belum mengarah untuk mencegah. Menentukan pola golongan PMKS merupakan salah satu cara yang dapat dilakukan. Algoritma Apriori memiliki fungsi untuk membantu menemukan pola yang terdapat pada data (frequent pattern mining) untuk menentukan frequent itemset yang menggunakan metode Association Rule dalam data mining. Dalam penghitungan secara manual yang dilakukan maka didapat pola kombinasi antara lain 3 rules yang memiliki nilai minimum support 15% dengan confidence tertinggi 100% menggunakan Algoritma Apriori. Dalam menguji Algoritma Apriori digunakan aplikasi RapidMiner. RapidMiner merupakan satu dari beberapa software pengolah data mining, misalnya menganalisis teks, mengekstrak pola data set kemudian dikombinasikan menggunakan metode statistik, database, dan kecerdasan buatan agar didapat informasi yang tinggi berasal dari olahan data. Hasil yang didapat dari pengujian perbandingan pola antar golongan PMKS. Dari pengujian menggunakan aplikasi RapidMiner dan penghitungan secara manual Algoritma Apriori, maka disimpulkan dengan kriteria pengujian, bahwa pola (rules) golongan dengan nilai confidence (c) penghitungan manual Algoritma Apriori dapat dibilang tidak mendekati hasil pengujian aplikasi RapidMiner, maka dapat dikatakan tingkat keakuratan pengujian rencah, hanya 37,5%.


Author(s):  
Subrata Datta ◽  
Kalyani Mali ◽  
Sourav Das ◽  
Srijita Kundu ◽  
Sayanta Harh

Author(s):  
Feng Xiong ◽  
Hongzhi Wang

The data mining has remained a subject of unfailing charm for research. The knowledge graph is rising and showing infinite life force and strong developing potential in recent years, where it is observed that acyclic knowledge graph has capacity for enhancing usability. Though the development of knowledge graphs has provided an ample scope for appearing the abilities of data mining, related researches are still insufficient. In this paper, we introduce path traversal patterns mining to knowledge graph. We design a novel simple path traversal pattern mining framework for improving the representativeness of result. A divide-and-conquer approach of combining each path is proposed to discover the most frequent traversal patterns in knowledge graph. To support the algorithm, we design a linked list structure indexed by the length of sequences with handy operations. The correctness of algorithm is proven. Experiments show that our algorithm reaches a high coverage with low output amounts compared to existing frequent sequence mining algorithms.


2022 ◽  
Vol 1 ◽  
Author(s):  
Agostinetto Giulia ◽  
Sandionigi Anna ◽  
Bruno Antonia ◽  
Pescini Dario ◽  
Casiraghi Maurizio

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.


Author(s):  
Yan Li ◽  
Shuai Zhang ◽  
Lei Guo ◽  
Jing Liu ◽  
Youxi Wu ◽  
...  

2022 ◽  
pp. 1-12
Author(s):  
Jingyi Li

Traditional financial data storage methods are prone to data leakage and narrow data coverage. Therefore, this paper proposes a dynamic and secure storage method of financial data based on cloud platform.In order to improve the ability of enterprise data management, the paper constructs a financial cloud computing platform, mining financial data by rough set theory, and analyzing the results of frequent pattern mining of financial data by fuzzy attribute characteristics.According to the granularity theory, the financial data is classified and processed, and the CSA cloud risk model is established to realize the dynamic and secure storage of financial data.The experimental results show that. The maximum data storage delay of this method is no more than 4.1 s, the maximum data leakage risk coefficient is no more than 0.5, the number of data types can reach 30, and the data storage coverage is improved.


Author(s):  
Taewoong Ryu ◽  
Unil Yun ◽  
Chanhee Lee ◽  
Jerry Chun‐Wei Lin ◽  
Witold Pedrycz

Sign in / Sign up

Export Citation Format

Share Document