scholarly journals Improve consensus partitioning via a hierarchical procedure

2021 ◽  
Author(s):  
Zuguang Gu ◽  
Daniel Huebschmann

Consensus partitioning is an unsupervised method widely used in high throughput data analysis for revealing subgroups and assigns stability for the classification. However, standard consensus partitioning procedures are weak to identify large numbers of stable subgroups. There are two main issues. 1. Subgroups with small differences are difficult to separate if they are simultaneously detected with subgroups with large differences. And 2. stability of classification generally decreases as the number of subgroups increases. In this work, we proposed a new strategy to solve these two issues by applying consensus partitionings in a hierarchical procedure. We demonstrated hierarchical consensus partitioning can be efficient to reveal more subgroups. We also tested the performance of hierarchical consensus partitioning on revealing a great number of subgroups with a DNA methylation dataset. The hierarchical consensus partitioning is implemented in the R package cola with comprehensive functionality for analysis and visualizations. It can also automate the analysis only with a minimum of two lines of code, which generates a detailed HTML report containing the complete analysis.

2020 ◽  
Author(s):  
Erfan Sharifi ◽  
Niusha Khazaei ◽  
Nicholas Kieran ◽  
Sahel Jahangiri Esfahani ◽  
Abdulshakour Mohammadnia ◽  
...  

Author(s):  
Andreas Quandt ◽  
Sergio Maffioletti ◽  
Cesare Pautasso ◽  
Heinz Stockinger ◽  
Frederique Lisacek

Proteomics is currently one of the most promising fields in bioinformatics as it provides important insights into the protein function of organisms. Mass spectrometry is one of the techniques to study the proteome, and several software tools exist for this purpose. The authors provide an extendable software platform called swissPIT that combines different existing tools and exploits Grid infrastructures to speed up the data analysis process for the proteomics pipeline.


Gene ◽  
2021 ◽  
pp. 146111
Author(s):  
Erfan Sharifi ◽  
Niusha Khazaei ◽  
Nicholas W. Kieran ◽  
Sahel Jahangiri Esfahani ◽  
Abdulshakour Mohammadnia ◽  
...  

2013 ◽  
Vol 2013 ◽  
pp. 1-11 ◽  
Author(s):  
Dongmei Li ◽  
Timothy D. Dye

Resampling-based multiple testing procedures are widely used in genomic studies to identify differentially expressed genes and to conduct genome-wide association studies. However, the power and stability properties of these popular resampling-based multiple testing procedures have not been extensively evaluated. Our study focuses on investigating the power and stability of seven resampling-based multiple testing procedures frequently used in high-throughput data analysis for small sample size data through simulations and gene oncology examples. The bootstrap single-step minPprocedure and the bootstrap step-down minPprocedure perform the best among all tested procedures, when sample size is as small as 3 in each group and either familywise error rate or false discovery rate control is desired. When sample size increases to 12 and false discovery rate control is desired, the permutation maxTprocedure and the permutation minPprocedure perform best. Our results provide guidance for high-throughput data analysis when sample size is small.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Xinzhou Ge ◽  
Yiling Elaine Chen ◽  
Dongyuan Song ◽  
MeiLu McDermott ◽  
Kyla Woyshner ◽  
...  

AbstractHigh-throughput biological data analysis commonly involves identifying features such as genes, genomic regions, and proteins, whose values differ between two conditions, from numerous features measured simultaneously. The most widely used criterion to ensure the analysis reliability is the false discovery rate (FDR), which is primarily controlled based on p-values. However, obtaining valid p-values relies on either reasonable assumptions of data distribution or large numbers of replicates under both conditions. Clipper is a general statistical framework for FDR control without relying on p-values or specific data distributions. Clipper outperforms existing methods for a broad range of applications in high-throughput data analysis.


Sign in / Sign up

Export Citation Format

Share Document