scholarly journals Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications

2022 ◽  
Vol 1 ◽  
Author(s):  
Agostinetto Giulia ◽  
Sandionigi Anna ◽  
Bruno Antonia ◽  
Pescini Dario ◽  
Casiraghi Maurizio

Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.

Author(s):  
Yun Sing Koh ◽  
Russel Pears ◽  
Gillian Dobbie

Association rule mining discovers relationships among items in a transactional database. Most approaches assume that all items within a dataset have a uniform distribution with respect to support. However, this is not always the case, and weighted association rule mining (WARM) was introduced to provide importance to individual items. Previous approaches to the weighted association rule mining problem require users to assign weights to items. In certain cases, it is difficult to provide weights to all items within a dataset. In this paper, the authors propose a method that is based on a novel Valency model that automatically infers item weights based on interactions between items. The authors experiment shows that the weighting scheme results in rules that better capture the natural variation that occurs in a dataset when compared with a miner that does not employ a weighting scheme. The authors applied the model in a real world application to mine text from a given collection of documents. The use of item weighting enabled the authors to attach more importance to terms that are distinctive. The results demonstrate that keyword discrimination via item weighting leads to informative rules.


2019 ◽  
Vol 203 ◽  
pp. 107395 ◽  
Author(s):  
Konstantinos Vougas ◽  
Theodore Sakellaropoulos ◽  
Athanassios Kotsinas ◽  
George-Romanos P. Foukas ◽  
Andreas Ntargaras ◽  
...  

2021 ◽  
Author(s):  
Erna Hikmawati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

Abstract The process of extracting data to obtain useful information is known as data mining. Furthermore, one of the promising and widely used techniques for this extraction process is association rule mining. This technique is used to identify interesting relationships between sets of items in a dataset and predict associative behavior for new data. The first step in association rule mining is the determination of the frequent item set that will be involved in the rule formation process. In this step, a threshold is used to eliminate items excluded in the frequent itemset which is also known as the minimum support. Furthermore, the threshold provides an important role in determining the number of rules generated. However, setting the wrong threshold leads to the failure of the association rule mining to obtain rules. Currently, the minimum support value is determined by the user. This leads to a challenge that becomes worse for a user that is ignorant of the dataset characteristics. In this study, a method was proposed to determine the minimum support value based on the characteristics of the dataset. Furthermore, this required certain criteria to be used as thresholds which led to more adaptive rules according to the needs of the user. The results of this study showed that 6 from 8 datasets, obtained a rule with lift ratio > 1 using the minimum threshold value that was determined through this method.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 166815-166822
Author(s):  
Guanghui Fan ◽  
Wenjuan Shi ◽  
Liang Guo ◽  
Jun Zeng ◽  
Kaixuan Zhang ◽  
...  

2021 ◽  
pp. 241-253
Author(s):  
Alexandar Vincent-Paulraj ◽  
Girvan Burnside ◽  
Frans Coenen ◽  
Munir Pirmohamed ◽  
Lauren Walker

2019 ◽  
Vol 8 (S2) ◽  
pp. 9-12
Author(s):  
R. Smeeta Mary ◽  
K. Perumal

In data mining finding out the frequent itemsets is one of the very essential topics. Data mining helps in identifying the best knowledge for different decision makers. Frequent itemset generation is the precondition and most time-consuming method for association rule mining. In this paper we suggest a new algorithm for frequent itemset detection that works with datasets in distributed manner. The proposed algorithm brings in a new method to find frequent itemset not including the necessitate to create candidate itemsets. The proposed approach could be implemented using horizontal representation for transaction datasets and allocating prime value. It explores all the frequent itemset that is present in the input and according to the support the maximum frequent itemset is identified. It was applied on different transactions database and compared with well-known algorithms: FP-Growth and Parallel Apriori with different support levels. The try out showed that the proposed algorithm attain major time improvement over both algorithms.


Sign in / Sign up

Export Citation Format

Share Document