Generation and Analysis of Tree Structures Based on Association Rules and Hierarchical Clustering

Author(s):  
Mihaela Vranic ◽  
Damir Pintar ◽  
Zoran Skocir
2004 ◽  
Vol 8 (1) ◽  
pp. 25-51 ◽  
Author(s):  
Frans Coenen ◽  
Graham Goulbourne ◽  
Paul Leng

Author(s):  
Jéssica de Farias Pereira ◽  
Ana Estela Antunes da Silva

This project aims to explore a set of association rules through hierarchical clustering as a way to preprocess the rules. The goal is to have a limited set of association rules based on the creation of groups which would indicate promising rules. The data used in the clustering and the construction of the rules are parameters of solar flares. For the construction of a proper data set of solar flares, the following actions were done i) choose a set of data; ii) preprocess the data in order to transform categorical data into binary data; iii) create groups by hierarchical clustering; iv) select groups to create association rules; v) apply the rules using an association algorithm; vi) validate the association rules.


Author(s):  
Wolfgang Kaisers ◽  
Holger Schwender ◽  
Heiner Schaal

We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as an unspecific diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool and a quality criterion and for RNAseq experiments.


2018 ◽  
Vol 19 (11) ◽  
pp. 3687
Author(s):  
Wolfgang Kaisers  ◽  
Holger Schwender ◽  
Heiner Schaal 

We apply hierarchical clustering (HC) of DNA k-mer counts on multiple Fastq files. The tree structures produced by HC may reflect experimental groups and thereby indicate experimental effects, but clustering of preparation groups indicates the presence of batch effects. Hence, HC of DNA k-mer counts may serve as a diagnostic device. In order to provide a simple applicable tool we implemented sequential analysis of Fastq reads with low memory usage in an R package (seqTools) available on Bioconductor. The approach is validated by analysis of Fastq file batches containing RNAseq data. Analysis of three Fastq batches downloaded from ArrayExpress indicated experimental effects. Analysis of RNAseq data from two cell types (dermal fibroblasts and Jurkat cells) sequenced in our facility indicate presence of batch effects. The observed batch effects were also present in reads mapped to the human genome and also in reads filtered for high quality (Phred > 30). We propose, that hierarchical clustering of DNA k-mer counts provides an unspecific diagnostic tool for RNAseq experiments. Further exploration is required once samples are identified as outliers in HC derived trees.


Sign in / Sign up

Export Citation Format

Share Document